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If you program in C, you rarely need con- 
cern yourself with the mechanics of arith- 
metic in your applications. You simply de- 
clare your variables as long or short, 
signed or unsigned, integer or floating. 
You can then trust the compiler to translate 
the arithmetic operators in your source 
code into the proper machine instructions, 
sequences of instructions, or calls to li- 
brary routines. You can even mix data 
types if you wish (multiplying a floating 
point number by an integer, for example); 
the compiler will generate the appropriate 
code to convert (or "coerce") the type of 
one piece of data to match the other. 

In assembly language programming, 
on the other hand, you can't avoid the is- 
sues of computer arithmetic and data typ- 
ing. You must have a solid grasp of signed 
and unsigned two's complement arithme- 
.ic, the CPU's built-in support for the basic 
arithmetic operations, and the algorithms 
by which more complex arithmetic opera- 
tions can be constructed out of the avail- 
able machine instructions. 

In the next several columns, we'll ex- 
plore some of these subjects together. We 
will begin with the Intel 80x86's native 
support for single- and double- precision 
integer arithmetic, then develop a library 
of variable precision arithmetic routines, 
and finally examine the capabilities of the 
80x87 numeric coprocessor. As usual, the 
emphasis will be on practical rather than 
theoretical issues, although I will try to 
provide some of the more abstract refer- 
ences. 

BASIC TERMINOLOGY 

There are two pairs of terms that will crop 
up repeatedly in these discussions of com- 
puter arithmetic: single-precision versus 
double-precision integers, and signed ver- 
sus unsigned integers. We should agree on 
the meaning of these terms at the outset. 

The maximum size of a single-preci- 
sion integer varies from machine to ma- 
chine, but I shall take it always to denote a 
.umber that will fit into a general register 
and that can be operated on conveniently 
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> Handling arithmetic 
operations in assembly 
language requires a lot of 
care and attention to 
logic; here are some 
proven routines to add to 
your programming library 
to make it easier. 



with single machine instructions. It is also 
a power-of-2 multiple of bytes. On the 
8086, 8088. 80286, and the 80386 running 
in real mode or in 16-bit protected mode, a 
single- precision integer is 16 bits, or 2 
bytes. On the 80386 in 32-bit protected 
mode, a single-precision integer is 32 bits, 
or 4 bytes. 

As you'd expect, a double-precision in- 
teger is twice the size of a single-precision 
number for a given machine, and again, it 
is always a power-of-2 multiple of bytes. 
On the 8086, 8088. 80286, and 80386, in 
real mode or 1 6-bit protected mode, a dou- 
ble-precision integer is 32 bits. On the 
80386 in 32-bit protected mode, a double- 
precision integer is 64 bits. Most double- 
precision integer operations enjoy only 
primitive support in the 80x86 instruction 
set, and — in the absence of a numeric co- 
processor — must be carried out with se- 
quences of machine instructions that are 
sometimes rather lengthy. 

The distinction between signed and un- 
signed integers is straightforward. In a 
signed integer, the most significant bit is 
reserved for the arithmetic sign. The bit is 



if the number is positive, 1 if the number 
is negative. The remaining bits indicate the 
number's magnitude. The range for a 16- 
bit signed integer, for example, is from 
-32,768 (FFFFH) to 32,767 (7FFFH). In 
an unsigned integer, all bits, including the 
significant bit, indicate magnitude. A 16- 
bit unsigned integer can range from to 
65,535 (FFFFH). 

But wait a minute, you may say — that 
unsigned 65,535 looks just like a signed 
-32,768! You're quite right: bits are bits, 
and the "signedness" or "unsignedness" 
of a given bit pattern depends strictly on 
your point of view. But picking the right 
point of view is very important; a logical 
error in which a signed integer is treated as 
unsigned or vice versa can be the cause of 
quite subtle and difficult program bugs, as 
we shall see later. 

SINGLE-PRECISION INTEGER ARITHMETIC 

The 80x86 CPU family supports single- 
precision integer addition, subtraction, 
multiplication, and division with the fol- 
lowing instructions: 

add single-precision 

addition 
Sob single-precision :::;:» 

subtraction 
MUL unsigned single-precision 

;;;; multiplication Kir 
IMOL signed single-precision 

multiplication 
DIV Unsigned single-precision 

division 
IDIV Signed single-precision 

division 
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The instructions listed above are your 
fundamental tools for working with the 
family of single-precision integers, and 
you must be thoroughly familiar with their 
behavior, as well as any of their idiosyn- 
crasies. These related, but less important, 
instructions are 

NEC Two's complement (multiply 

by -i) WMBWBBBS 

CMP Compare single-precision 
integers 

CBW Sign-extend 8-bits to 16 -bits 

ADD, SUB, NEG, and CMP set the 
CPU's flags (sign, carry, overflow, and 
zero are the most important) according to 
the result of the operation. Actually, CMP 
can be thought of as a sort of nondestruc- 
tive SUB that doesn't do anything but set 
the CPU flags (this is an easy way to re- 
member the order of CMP's operands). 

You probably noted that ADD, SUB, 
and CMP do not come in "signed" and 
"unsigned" varieties. This is because the 
"signed" or "unsigned" nature of the re- 
sult is solely in the eye of the beholder. If 
you want to regard the result as unsigned, 
you test the carry flag; if you prefer to think 
of the result as signed, you test the sign and 
overflow flags. The 80x86 family has an 
astonishingly diverse battery of condition- 
al jumps to provide for this and similar 
contingencies. 

For example, if you're performing a 
conditional branch after comparing two 
addresses (which are unsigned values), 
you would use the JB , JBE, JA, or JAE in- 
structions. After comparing two dollar 
amounts (signed values), you would use 
the JL, JLE, JG, or JGE instructions. Test- 
ing the wrong flags or selecting the wrong 
type of conditional jump is a common 
cause of obscure program bugs — particu- 
larly when addresses are being calculated 
or compared. Such bugs may lie dormant 
for a long time and then bite suddenly 
when a change is made to a completely un- 
related part of the program. 

The multiply and divide instructions are 
a little more interesting and a little less reg- 
ular. MUL and IMUL affect only the carry 
and overflow flags, leaving the rest unde- 
fined; DIV and IDIV leave the state of all 
flags undefined. The signed instruc- 
tions — IMUL and IDIV — have slightly 
less range because they render special 



treatment to the sign bit. Obviously, the 
unsigned instructions — MUL and DIV — 
should always be used when you are work- 
ing with addresses. 

Earlier, I asserted glibly that the multi- 



ply and divide instructions are single- 
precision arithmetic operations. The 
whole truth is not so simple. The multiply 
instructions accept two single-precision 
operands, but they produce a double-preci- 



Figure 1 : Here is a double-precision assembly language multiplication routine for the 8086, 8088, 
80286, and 80386. It accepts two 32-bit arguments and returns a 64-bit result. 
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page 55,132 
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: Call with: 



; 

; Returns: 
i 

; Destroys: 



DX:AX 
CX:BX 

DX:CX:BX:AX 

nothing 



TEXT 


segment 


word publ 


W0 


equ 


word ptr 


Wl 


equ 


word ptr 


w2 


equ 


word ptr 


w3 


equ 


word ptr 




assume 


cs :_TEXT 




public 


dmul 


dmul 


proc 


near 




push 


si 




push 


di 




push 


bp 




mov 


bp, sp 




sub 


sp, 8 




mov 


di , dx 




mov 


si , ax 




mul 


bx 




mov 


w0 , ax 




mov 


wl , dx 




mov 


ax,di 




mul 


cx 




mov 


w2,ax 




mov 


w3,dx 




mov 


ax,di 




mul 


bx 




add 


wl, ax 




adc 


w2,dx 




adc 


w3,0 




mov 


ax, si 




mul 


cx 




add 


wl,ax 




adc 


w2,dx 




adc 


w3,0 




pop 


dx 




pop 


cx 




pop 


bx 




pop 


ax 




pop 


bp 




pop 


di 




pop 


si 




ret 




dmul 


endp 




_TEXT 


ends 






end 





= double-precision argument 1 
= double-precision argument 2 

= quad-precision product 



local variables 



save registers 

set up stack frame 
for forming result 



; save copy of argument 1 

; argl low * arg2 low 

; argl high * arg2 high 

; argl high * arg2 low 

; accumulate result 

; argl low * arg2 high 

; accumulate result 

; load quad-precision result 

; restore registers 

; and exit 
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sion result. One argument must always be 
in register AX, while the other can be in 
any other register or in memory; the result 
always appears in registers DX and AX, 
with the most significant part in DX. The 
conventional notation for this latter situa- 
tion is DX:AX. (On the 80386 in 32-bit 
protected mode, EAX and EDX are used 
instead of AX and DX.) 

The divide instructions accept a double- 
precision dividend and a single-precision 
divisor, and they produce a single-preci- 
sion quotient and remainder. The dividend 
is always taken from DX:AX; the divisor 
can be in any other register or in memory. 
The quotient is always left in register AX, 
while the remainder appears in DX. 
(Again, on the 80386 in 32-bit protected 
mode, EAX and EDX are used instead of 
AXandDX.) 

Why this mixing of single and double- 
precision arguments and results, and why 
this special treatment of DX and AX? The 
reason is that you need to be able to use the 
multiply and divide instructions to scale a 
single-precision value (by multiplying, 
then dividing) through a double-precision 
ntermediate without losing any precision. 
Use of dedicated registers to provide argu- 
ments or to accept results is an explicit 
trade-off of instruction set orthogonality 
for more compact opcodes and therefore 
smaller programs. 

As an aside, it is interesting to note the 
claims by Apple (and Motorola) that the 
68000 in the original Macintosh is a 32-bit 
microprocessor. In spite of the fact that the 
68000 has 32-bit registers, its multiply in- 
struction works on 16-bit arguments to 
generate 32-bit results, and its divide in- 
struction returns 16-bit results. This alone 
is sufficient to reveal the 68000 as what it 
is: a 16-bit microprocessor that happens to 
have a lot of address lines! Only in the 
68020 and 68030 (used in the Mac SE/30 
and various Mac II models) do we find the 
true 32- by 32-bit multiply and 64- by 32- 
bit divide that are diagnostic of a true 32- 
bit processor. 

The 80286 and 80386 support an odd— 
but handy — form of the IMUL instruction 
that is not found on the 8086 and 8088. It is 
one of the very few instructions in the en- 
tire 80*86 family that has three operands: 
the destination is always a register; one of 
le source operands is a register or memo- 
ry address; and the other is an "immedi- 



ate" or literal value. This form of IMUL 
has a number of other peculiarities: the re- 
sult of the operation is a single-precision 
value rather than double; the result can go 
to a register other than AX and DX; a reg- 



ister argument need not be in AX or DX; 
and one of the arguments is not (necessar- 
ily) destroyed by the operation. For exam- 
ple, to multiply the contents of CX by 10 
and leave the result in register BX, you 



Figure 2: This double-precision multiplication routine is for the 80386 CPU in 32-bit protected mode. It 
accepts two 64-bit arguments and returns a 1 28-bit result. 
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EDX: EAX 
ECX:EBX 



double-precision argument 1 
double-precision argument 2 



Returns : 
Destroys : 
TEXT segment 



W0 

wl 
w2 
w3 



dmul 



EDX:ECX:EBX:EAX = quad-precision product 
nothing 

dword public use32 'CODE' 



equ 
equ 
equ 
equ 



dword ptr [ebp-4] 

dword ptr [ ebp-8 ] 

dword ptr [ebp- 12] 

dword ptr [ebp-16] 



assume cs : TEXT 



public 
proc 

push 

push 

push 

mov 

sub 

mov 
mov 

mul 
mov 
mov 

mov 
mul 
mov 
mov 

mov 
mul 
add 
adc 
adc 

mov 
mul 
add 
adc 
adc 

pop 
pop 
pop 
pop 




dmul 
near 

esi 
edi 
ebp 

ebp, esp 
esp, 16 

edi , edx 
esi , eax 

ebx 

w0,eax 
wl,edx 

eax, edi 
ecx 

w2 , eax 
w3 , edx 

eax, edi 
ebx 

wl,eax 
w2 , edx 
w3, 

eax, esi 
ecx 

wl, eax 
w2,edx 
w3,0 

edx 
ecx 
ebx 
eax 

ebp 
edi 
esi 



local variables 



save registers 

set up stack frame 
for forming result 



save copy of argument 1 
argl low * arg2 low 

argl high * arg2 high 

argl high * arg2 low 
accumulate result 

argl low « arg2 high 
accumulate result 

load quad-precision result 

restore registers 
and exit 
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FOR $1699, YOU CAN GET 
CUSTOM CONFIGURATION, 
SELF- DIAGNOSTIC SOFTWARE, 
A 30-DAY MONEY-BACK 
GUARANTEE AND 
NEXT-DAY DESKSIDE SERVICE. 




These days, there are literally 
dozens of 286 systems to 
choose from. 

But in a recent PC Week 
Poll of corporate buyers, it was 
the Dell system that was ranked 
Number 1 for overall customer 
satisfaction. 

Even at the first glance, it's 
easy to see that two things 



make our System 210 
different from all 
the rest. 

What we put 
behind it. And what 
we put in it. Because 
we build every one of these 
systems a little bit differently 
than everybody else. And we support 
them like nobody else. With some 
systems, you can buy an expensive 
service contract. With others, you're 
lucky if you get a parent company 
with a permanent address. 

But with our System 210, you get 
the most complete package of service 
and support in the industry. And you 
get it at absolutely no charged 

To begin with, when you buy from 
us, we give you a personal account 
representative. Who gets to know 
everything about you and your system. 

In case you ever have a problem, 
we back you up with multiple levels 
of support. 

For example, our self-diagnostic 
software. And our toll-free technical 
support hotline. Between these two, 



we can resolve 90% of your problems 
right away. 

And we can take care of the 
other 10% by the next business day. 
With every system, you get next- 
day deskside service provided by 
a Xerox Corporation technician. 

EVERY GUARANTEE 
HAS A GREAT 
COMPUTER 
BEHIND IT. 

One of the reasons we can afford 
to give you so much support is that 
our systems need almost no support 
at all. 

We make the System 210 right 
here in the U.S. We build every 
system to the highest standard of 
quality. And we custom configure it 
to your exact specifications. From 
the system board up. 

And as you can see, we've left 
nothing out. At $1699, even the most 
basic configuration gives you 512 KB 
of RAM, a 20 MB hard drive, and 
a VGA monochrome monitor. 

Besides the support we already 
described, the System 210 also comes 
with a guarantee you won't find any- 
where else: Try it in your office for 30 
days. If you aren't completely satisfied — 
for any reason— even if the case color 
doesn't match your office —send it 
back. And we'll refund your money. 
With no questions asked. 

If you'd rather not tie up your money 
in the first place, a lease plan can be 
designed to fit the exact needs of your 
business, t 

So it doesn't matter whether you're 
looking for the best 286 system, the 
best price, or the best support plan in 
the industry. 

Just give us a call. 

And we'll send you all three. In 
the same box. 
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TO ORDER. CALL 

800-426-5150. 

FOR DELL IN CANADA. CALL 800-387.5752. 



AND A GREAT BOX 







THE NEW 12.5 MHz 286 DELL SYSTEM® 210. 



STANDARD FEATURES: 

• 80286 microprocessor running at 
12.5 MHz. 

• Choice of 512 KB, 640 KB* 1 MB 
or 2 MBof RAM expandable to 
16 MB (6 MB on system hoard). 

•5.25" 1.2MB or 3. 5" 1.44 MB 
diskette drive. 

• Socket for Intel 80287 math 
coprocessor. 

• Integrated diskette and high 
performance 16-bit VGA video 
controller on system board. 

• L1M 4.0 support for memory over 
1MB. 



• foge mode interleaved memory 
architecture. 

• Integrated high performance hard 
disk interface on system board. 

• Enhanced 101-key keyboard. 

• 1 parallel and 2 serial ports. 

• 3 full-sized 16-bit AT expansion 
slots available. 

OPTIONS: 

• 40 MB or 150 MB tape drives. 

• Intel 80287 math coprocessor. 

• 128 KB RAM upgrade kit. 
•512 KB RAM upgrade kit. 

• 2 MB RAM upgrade kit. 



**Leasefvr as law as S64lmanth. 
^Extended Service Plan pricmgstarts at $190. 

System 

210 With Monitor 



Hani 




VGA 


Super 


Disk 


VGA 


Color 


VGA 


Drives 


Mono 


Plus 


Color 




512KB 


512KB 


512 KB 




RAM 


RAM 


RAM 


20 MB 


$1,699 


$1,999 


$2,099 


40 MB 


$1,899 


$2,199 


$2,299 


100 MB 


$2,499 


$2,799 


$2,899 



•640 KB versions of the above systems are 
Mailable for tin additional $80. 



All prices and specifications are sub)! 
t Leasing arranged by Leasing Group, 



t pnivided by Xenix Corpora 



travel charges. Dell Swem « a rc-istm-d trademark . i WW Oomr-.ter Corporation. <C 198** DELL COMPUTER CORPORATION. ALL RIGHTS RESERVED. 
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would write 

IMUL BX,CX,10 

I should mention that MUL, IMUL, 
DIV, and IDIV additionally support "half- 
precision" operations (operating on or re- 
turning 8-bit values). These are rarely used 
in the course of normal application pro- 
gramming and will not be referred to fur- 
ther in these columns. 

DOUBLE-PRECISION INTEGERS 

The 80x86 family's support for double- 
precision operations is discouragingly 
meager. In addition to the arithmetic in- 
structions we've already considered, you 
are provided with only the following: 

ADC Single- pre c i si On r 

: addition with carry : ■ 
SBB Single-precision subtrac- :"*; 
tion with carry (borrow) 

These instructions, in essence, allow 
you to propagate the carry bit through the 
piecewise addition and subtraction of mul- 
tiple-precision values. For example, to add 
a double-precision value in DX:AX to a 
double-precision value in SI:DI, leaving 
the result in DX:AX, you would write 

ADD AX,DI ; lower half 

ADC DX,SI ; upper half 

Similarly, to subtract a double-precision 
value in SI:DI from a double-precision val- 
ue in DX:AX, leaving the result in 
DX:AX, you would write 

SUB AX,DI ; lower half 

SBB DX,SI ; upper half 



Other loosely related instructions, use- 
ful mainly for conversion of single-preci- 
sion values to double-precision, are 



CWD 


Sign-extend 16-bits to 






CDQ 


Sign-extend 32 -bits to 




64-bits (80386 only) 


MOVSX 


Sign-extend 8-bits or 




16-bits to 16-bits or 




32-bits (80386 only) 


MOVZX 


zero-extend 8-bits or 




16-bits to 16-bits or 




32-bits (80386 only) 



To take the two's complement of a dou- 
ble-precision number, you can use the 
time-tested technique of flipping all the 
bits and then adding 1. For example, to 
change the sign of a double-precision num- 
ber in DX:AX, you would write 

NOT DX 

NOT AX 

ADD AX,1 

ADC DX,0 



A slightly faster technique relies on the 
fact that NEG sets the carry flag: 

NEG DX 
NEG AX 
SBB DX,0 

What about double-precision multipli- 
cation and division? Taking the single- 
precision native MUL, IMUL, DIV, and 
IDIV instructions as our guide, we know 



Figure 3: Corresponding to Figure 1 , this double-precision division routine is for the 8086, 8088, 
80286, and 80386 (real or 1 6-bit protected mode). It accepts a 64-bit dividend and 32-bit divisor, 
returning a 32-bit quotient and 32-bit remainder. 
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SSI 



.'.H3i[H 



; Call 


with: 


DX:CX:BX:AX 




quad-precision dividend 








SI :DI 


= 


double-precision divisor 




; Returns: 


DX:AX 


= 


double-precision quotient 








CX:BX 


= 


double-precision remainder 




; Destroys: 


SI , DI 








_TEXT 


segment 


word public 


'CODE 








assume 


cs: TEXT 










public 


ddiv 








ddiv 


proc 


near 










push 


bp 


| 


save register 






mov 


bp , cx 




BP = 3sw of dividend 






mov 


cx, 32 




initialize loop counter 






clc 






carry flag initially clear 




ddivl: 


rcl 


ax, 1 




test this bit of dividend 






rcl 


bx,l 










rcl 


bp, 1 










rcl 


dx,l 










jnc 


ddiv3 




jump if bit was clear 




ddiv2: 


sub 


bp,di 




subtract divisor from dividend 






sbb 


dx, si 










stc 






force carry flag set and 






loop 


ddivl 




shift it into forming quotient 






jrap 


ddiv5 








ddiv3: 


cmp 


dx, si 




dividend > divisor? 






jc 


ddiv4 




no, jump 






jne 


ddiv 2 




yes, subtract divisor 






cmp 


bp,di 










jnc 


ddiv2 




yes, subtract divisor 




ddiv4 : 


clc 






force carry flag clear and 






loop 


ddivl 




shift it into forming quotient 




ddivS: 


rcl 


ax, 1 




bring last bit into quotient 






rcl 


bx,l 








mov 


cx.bp 










xchg 


dx,bx 




put quotient in DX:AX 






xchg 


cx , bx 




put remainder in CX:BX 






pop 


bp 




restore register 






ret 




r 


and exit 




ddiv 


endp 










TEXT 


ends 












end 
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that a truly useful double-precision multi- 
ply must process two double-precision 
arguments to produce a quad-precision re- 
sult. Similarly, a fully generalized double- 
precision divide must accept a quad-preci- 
sion dividend and double-precision 
divisor, yielding a double-precision quo- 
tient and remainder. 

At this point, your intuition as a veteran 
80x86 programmer is probably whispering 
that you are about to run short of registers. 
The problems actually go far deeper than 



Knuth describes an 
algorithm for a 
multiple-precision 
divide that is 
constructed on single- 
precision divides, 
but it's quite complex 
and not very fast. 



this, however. You might reasonably hope 
that the built-in single-precision multiply 
and divide instructions could be employed 
as useful building blocks for double-preci- 
sion (or multiple-precision) multiply and 
divide routines. Unfortunately, fate is not 
so kind. 

The hard reality is that the hardware's 
single-precision multiply instruction is 
only marginally helpful when used for 
stepwise multiple-precision multiplication 
operations in the "obvious" manner. 
That's because MUL and IMUL are quite 
slow on the older 8086 and 8088 proces- 
sors. As for multiple-precision divides, the 
hardware's built-in divide instruction is 
(for all practical purposes) useless. Al- 
though Donald Knuth has described an al- 
gorithm for a multiple-precision divide 
that is constructed on single-precision di- 
vides, it is quite complex and — worse 
yet — not really very fast. 

In the next installment I'll discuss the 
hoary shift-and-add (for multiplication) 



er history. We'll then use these algorithms 
as the basis of multiple-precision multipli- 
cation and division routines capable of 
processing arguments of any size. In the 
meantime — just to tide you over and give 
you some code to look at — Figures 1,2,3, 
and 4 contain the source code for double- 
precision multiplication and division sub- 
routines that are somewhat faster (because 



listings. 
THE IN-BOX 

Please send your questions, suggestions, 
and comments to me at any of the follow- 
ing e-mail addresses: 
PCMagNet: 72241,52 
MCI Mail: rduncan 

BIX: rduncan ■ 



Figure 4: This double-precision division routine for the 80386 in 32-bit protected mode accepts a 1 28- 
bit dividend and 64-bit divisor and returns a 64-bit quotient and 64-bit remainder. 
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EDX:ECX:EBX:EAX = quad-precision dividend 

ESI : EDI = double-precision divisor 



Returns : 

Destroys : 
TEXT segment 



EDX: EAX 
ECX: EBX 



= double-precision quotient 
" double-precision remainder 



ESI, EDI 

dword public use32 'CODE' 



□HIS 





assume 


cs: TEXT 




public 


ddiv 


ddiv 


proc 


near 




push 


ebp 




mov 


ebp, ecx 




mov 


ecx, 6 4 




clc 




ddivl: 


rcl 


eax, 1 




rcl 


ebx, 1 




rcl 


ebp, 1 




rcl 


edx, 1 




j nc 


ddiv3 


ddiv2: 


sub 


ebp, edi 




sbb 


edx, esi 




stc 






loop 


ddivl 




jmp 


ddiv5 


ddiv3: 


cmp 


edx, esi 




jc 


ddiv4 




j ne 


ddiv2 




cmp 


ebp, edi 




jnc 


ddiv2 


ddiv4 : 


clc 






loop 


ddivl 


ddiv5: 


rcl 


eax, 1 




rcl 


ebx, 1 




mov 


ecx, ebp 




xchg 


edx , ebx 




xchg 


ecx, ebx 




pop 


ebp 




ret 




ddiv 


endp 




_TEXT 


ends 






end 











save register 
EBP = 3sw of dividend 
initialize loop counter 
carry flag initially clear 

test this bit of dividend 



jump if bit was clear 

subtract divisor from dividend 

force carry flag set and 
shift it into forming quotient 



dividend > divisor? 
no, jump 

yes, subtract divisor 

yes, subtract divisor 

force carry flag clear and 
shift it into forming quotient 

bring last bit into quotient 



put quotient in EDX: EAX 
put remainder in ECX: EBX 

restore register 
and exit 
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SYBASE dBASE 



LOTUS ORACLE I N G fi 



OS/2 EE DATA COM DB P f 




- 




You need FOCUS to read 
them all. PC/FOCUS is the only 
PC 4GL/DBMS database man- 
agement system that, through a 
host of connectivity options, 
gives DOS, OS/2 and LAN users 
access to all the data- 
regardless of the database struc- 
ture or hardware platform. 

PC/FOCUS is also the PC 
implementation of the world's 
leading Fourth Generation Lan- 
guage. Capable of universal joins 
and universal access, FOCUS is 
the perfect development tool for 
company-critical cross platform 
applications and for downsizing 
or upsizing applications. What's 
more, it's really easy to use. 

If you're beginning to see 
why you should have PC/ 
FOCUS, call or write Information 
Builders, Inc. 

PC/FOCUS 

Information Builders, Inc. 

PC/FOCUS is a registered trademark of Information Builders, Inc 
Other product names are trademarks of their respective holders 

1250 Broadway, New York, NY 10001 (212) 736-4433, Ext. 3700 

CIRCLE 244 ON READER SERVICE CARD 




by 

Ray Duncan 



Power 

Programming 

Arithmetic Routines 
For Your Computer 
Programs, Part 2 



Last time, I discussed single- and double- 
precision integer arithmetic operations on 
the 80*86 family of processors. In this col- 
umn, I want to generalize those techniques 
to cover integer addition, subtraction, and 
multiplication using any precision you 
might want or need in your programs. In 
the next installment, I'll talk a little more 
about multiplication and then address mul- 
tiple-precision division. 

Before beginning, however, a brief di- 
gression into the more-treacherous waters 
of data formats is in order. 

BIG-ENDIANS AND LITTLE-ENDIANS 

When you program in a high-level lan- 
guage, you generally do not need (or want) 
to know how the component bytes of an 
arithmetic value are laid out in memory. If 
you program in an assembly language, on 
*he other hand, an understanding of binary 
^ata formats is absolutely vital. If you are 
going to load an integer from memory into 
registers (or vice versa), you clearly need 
to know which end of the integer is which. 

You may be surprised to hear that the 
world is divided into two hostile camps 
(wags have dubbed them the "Big- 
Endians" and the "Little-Endians") over 
this seemingly innocuous issue. The Big- 
Endians are committed to a data format 
that puts the most significant byte of an in- 
teger at the lowest memory address, the 
next most significant byte at the next-high- 
er address, and so on. The Little-Endians, 
by contrast, are firm believers in a data for- 
mat in which the least significant byte of 
the number is placed at the lowest memory 
address and the most significant byte at the 
highest memory address occupied by the 
number. 

To make the difference in storage tech- 
niques more concrete, consider the 32-bit 
integer 12345678h, which is composed of 
four bytes. On a Little-Endian CPU, the 
four bytes would be layed out in memory 
thus: 

78h 56h 34h 12h 

with 78h occupying the lowest address and 



■ This installment covers 
addition, subtraction, 
and multiplication, both 
in generalized C-like 
form and in full assembly 
language routines. 



12h the highest. On a Big-Endian CPU, on 
the other hand, again moving upward from 
the lowest to the highest address, the four 
bytes would be arranged in memory as fol- 
lows: 

12h 34h 56h 78h 

Although there are many different ex- 
amples of CPUs that use each of these data 
formats, the front lines in this silly little 
war are manned by the Intel 80*86 users 
on the side of the Little-Endians and by the 
Motorola 680x0 users on the side of the 
Big-Endians. When a Macintosh program- 
mer meets a PC programmer, I'm often 
amazed at the intensity of the feelings 
aroused by this seemingly trivial issue. 

Before you get involved in any such 
heated discussions yourself, just remem- 
ber that these data formats are only con- 
ventions and that equally efficient CPUs 
can be built using either one. The Little- 
Endian approach, in which the signifi- 
cance of a byte ascends with its address, 
seems perfectly logical and intuitive to me, 
but I'll be the first to admit that hex dumps 
of memory are far easier to read and inter- 
pret on a Big-Endian machine. In any 
event, I'll be using the Little-Endian for- 



mat exclusively in the arithmetic routines 
to be developed in this column, both for 
the sake of consistency and to make it easi- 
er to plug in the use of an 80x87 numeric 
coprocessor later. 

Interestingly, the 80486 processor has a 
new instruction, called BSWAP, whose 
only purpose is to transform a 32-bit data 
value in a register from Big-Endian format 
to a Little-Endian format or back again. In 
other words, it performs the same function 
on 32-bit values as the XCHG instruction 
does on 16-bit values. For example, if you 
had a 32-bit value in register EAX, the in- 
struction 

BSWAP EAX 

would be exactly equivalent to (but much 
faster than) the sequence 

XCHG AH, AL 
ROL EAX, 16 
XCHG AH, AL 



MULTIPLE-PRECISION ALGORITHMS 

Whenever you need to perform addition, 
subtraction, multiplication, or division to a 
degree of precision beyond what your 
CPU's native machine instructions sup- 
port, you are led directly to the so-called 
classical algorithms for these operations. 
The classical algorithms are the underpin- 
nings of the stepwise, methodical proce- 
dures we all learned in grade school, using 
paper and pencil, for doing arithmetic on 
numbers with more than one digit. They 
are called classical because their history 
extends far back before the dawn of the 
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computer age. In fact, as Donald Knuth 
points out, the word "algorithm" was 
used exclusively in this sense for several 
centuries, before it acquired its modern, 
more general meaning. 

Figures 1 through 3 contain C-like 
pseudo-code that demonstrates the classi- 
cal algorithms for addition, subtraction, 
and multiplication. (I'm deferring multi- 
ple-precision division to the next install- 
ment.) The pseudo-code shown is mod- 
eled on Knuth's MIX assembly language 
listings in The Art of Computer Program- 
ming, Volume IT. Seminumerical Algo- 
rithms (section 4.3), a definitive work that 
should always be your ultimate recourse 
on these and related topics. 

In Figures 1 through 3, u[] and v[] are 
arrays that hold arguments in base b, one 
digit per array element. The actual physi- 
cal size of each array element is irrelevant, 
so long as it is large enough to hold a num- 
ber of magnitude b— 1 . The value m repre- 
sents the maximum number of digits in 
each argument, and the result is formed in 
array w[]. The variable k represents the 
"carry," which is set to the excess when 
the result of an operation does not fit into a 
single digit. 

To illustrate how these arrays and vari- 
ables are used, let's consider what happens 
when we add the very first digits of two 
multiple-precision numbers together. The 
value of the first result digit and the result- 
ing carry are found as follows: 

w[0] = (u[0] + v[0]) mod b 
k = (u[0] + v[0] ) / b 

Subsequent digits (1 through m - 1) in the 
result are found in the same way, except 
that the previous value of k is included, as 
follows: 

w[i] = (u[i] + v[i] + k) mod b 
k = (u[i] + v[i] + k) / b 

One interesting aspect of these classical al- 
gorithms is that they apply equally well to 
numbers in any base whatever. You can 
choose to view your arguments and results 
as bit arrays (base 2), or you can group the 
bits together and work on octal numbers 
(base 8) or hexadecimal numbers (base 
16); you can even allow the natural byte or 
word size of the CPU to be an individual 
"digit." 



ADDITION PSEUDO-CODE 



int m; 
int i; 
int b; 
int k; 

array u[m), v[m]; 
array w[m] ; 

k = 0; 



COMPLETE LISTING 



for(i = 0; i < m; i++) 
{ 

w{i] = (u[i] + v(i] + k) mod b 
k = (u[i] + v[i] + k) / b 

} 



// number of digits 

// index variable 

// base 

// carry 

// holds arguments 

// receives results 

// initialize carry 

// add digit by digit 



Figure 1 : In this simplified, C-like pseudo-code for multiple-precision addition, both arguments and 
the result are assumed to be nonnegative. The carry k always takes the value or 1 . 



SUBTRACTION PSEUDO-CODE 



int m 
int i 
int b 
int k 



COMPLETE LISTING 



array u[m], v[ 
array w[m] ; 

k - 0; 




for( i = 0; i < m; i++) ; 

{ 

w[i] = (u[i] - v[i] + k) mod b 

k = (u[i] - v[i] + 



// number of digits 

// index variable 

// base 

// carry 

// holds arguments 

// receives results 

// initialize carry 

If subtract digit by digit 



Figure 2: Multiple-precision subtraction in C-like pseudo-code. In this simplified presentation, both 
arguments and the result are assumed to be nonnegative, and the argument in array u[] is assumed to 
be greater than or equal to the argument in v[). The carry k always takes the value or - 1 . 



MULTIPLICATION PSEUDO-CODE 



COMPLETE LISTING 



int m; 


// 


number of arg. digits 


int i, j; 


// 


index variables 


int b; 


// 


base 


int k; 


// 


carry 


int t; 


// 


scratch variable 


array u[m], v[m]; 


// 


holds arguments 


array w[m*2]; 


// 


receives product 


for(i =0; i < m*2; i++); 


// 


initialize product 



w[i] - 0; 



} 



for(i = 0; 

{ 

k = 0: 



1 < I 



// sum partial products 
// initialize carry 
// find this partial product 



for( j = 0; j < m; j++) ; 

< 

t = u[j] * v[i] + w[i+j) + k; 

w[i+j) = t mod b; // digit of partial product 

= t / b; // calculate carry 



) 

w[ l+m] = k; 



// highest digit of 
// partial product 



Figure 3: This C-like pseudo-code for multiple-precision multiplication exploits the CPU's native 
multiply instruction. The square of the base must be less than or equal to the largest product that can 
be generated by the hardware's unsigned multiply instruction. In this simplified presentation, both of 
the arguments and the result are assumed to be nonnegative, and both arguments are the same size. 
The value of the carry k always satisfies the condition < = k < b, where b is the base. 
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INCOME VS SAVINGS ANALYSIS 

VIA INTERACTIVE OUTLIER REGRESSION 



INCOME VS SAVINGS ANALYSIS 

VIA INTERACTIVE OUTLIER REGRESSION 




1000 2000 3000 4000 

DISPOSABLE INCOME 

BO: 8.304 SE: 1.1173 T: 7.4321 

31: 1.0655E-3 SE: 7.4902E-4 T: 1.4225 

"== 0.24036 MSE: 20.691 DF: 33 
POINTS DELETED: 



DISPOSABLE INCOME 

BO: 7.1962 SE: 1.1736 T: 5.1319 
Bl: 2.7801E-3 SE: 1.0234E-3 T: 2.7166 

POINTS DELETED: CANADA. SWEDEN. ILS.A 



Select the points you want to remove from your 
regression model. . . 



. . . Then press F6 to refit the model and 
recalculate the statistics. 



Because "Statistical Graphics" 
Is Better Than Just Statistics and Graphics 



Most of today's PC statistical 
packages give you all the 
statistics you'll ever need. Some 
even give you a few graphics. But 
only STATGRAPHICS from STSC 
gives you integrated statistical 
graphics in an environment you 
control. 

Unique "What If" Interactivity 

STATGRAPHICS lets you explore 
data relationships fully, 
producing higher quality, more 
timely solutions. Define your data 
and assumptions, run the 
procedure and review the results, 
modify data and assumptions 
repeatedly and take another 
look— and another. All without 
leaving the procedure or making 
permanent changes to your data. 



Coupled with STATGRAPHICS' 
interactive environment are over 
50 types of graphs— traditional 
pie and bar charts, histograms, 
3-D line and surface plots, quality 
control charts, and more. All are 
integrated with the procedures 
so that they can be displayed 
instantly and modified 
repeatedly. 

Query data points, do on-screen 
forecasting and model fitting, 
overlay graphs, or zoom-in on any 
area for a closer look. With 
flexibility like that, you can spot 
and investigate visual trends in 
your data— trends you may have 
missed if you looked only at the 
numbers. 




Over 250 Statistical I 

• Direct Lotus® and dBASE* 
interfaces 

• ANOVA and regression analysis 

• Experimental design 

• Quality control procedures 

• Multivariate techniques 

• Nonparametric methods 

• Exploratory data analysis 

• Forecasting, time series 
analysis, and more. 

STATGRAPHICS— 

The Best Way to Do Statistics! 

Put the power of STATGRAPHICS 
to work for you today— all for only 
$895*. For our free convincerkitor 
the name of a dealer near you, call 

(800) 592-0050 ext. 200 

In Maryland, (301) 984-5123; 
Internationally, (301) 984-5412. 
Telex 898085 STSC ROVE 



STSC 



STSC, Inc. 

2115 East Jefferson Street 
Rockville, Maryland 20852 



variety of graphs supported on over 100 displays, printers and 
plotters, including the new IBM PS/2™ Series. 



•Suggested retail price in U.S. and Canada. 
International prices vary. Available through 
dealers and distributors worldwide. 
STATGRAPHICS, Lotus, and dBASE are 
registered trademarks of Statistical Graphics 
Corporation, Lotus Development Corporation, 
and AshtonTate, respectively. 
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frove to your Doss 
why you need an 

HP Scan Jet Plus. . 




Cut out the photo and 

• illustrations on the left. 

Put that dull, wordy 

• document down in front 
of the boss. 

For the clincher, lay the 

• graphic elements in 
place. 



Presto ! The page comes 
to life, almost as easily as if 
you'd scanned the images 
in with an HP ScanJet Plus 
scanner. Which you can do 
for just $2,190.* (In fact, all 
those images came fresh 
from our scanner.) 



This is a small, but 
tasty, example of 
how much you can spice 
up your communications. 
With photos, illustrations, 
and, thanks to OCR, text. 
Its 8-bit power provides 256 
levels of gray and lets you 
scale images from 4 to 200% 
in 1% increments. 



CHEESE LOUISE PIZZA COMPANY 





need to do 
something to liven 
up Cheese Louise 
Pizza sales. People 
love our pizzas, 
once they sink 
their teeth into 
them. But we're 
having trouble 
getting more 
customers into 
the parlors. 

What's the 
problem here? We 
have fresh, natural 
ingredients. 
Friendly peo- 
ple. Authentic 
Sicilian decor, with 



Have we got some winners here? 

Mount Etna erupting 
every hour on the 
hour. Our 
salads 



The best 
salami. No 
baloney! 

have Frankie's great - 
grandmother's secret 
dressing, and people 
keep telling us they've 
never tasted any- 
thing like it. 

Now, for the 
past two months, 
we've been hand- 
ing out 2-for-l 



flyers, the best 
deal in town. 
But so far, only 
Frankie's grand- 
mother has come 
in to get ten for 
the price of five. 
Maybe the 

flyers 
are 
too 
dull. 
Maybe 
the ingre- 
dients should 
jump right off the page 
and say: Hey, take a 
bite! But how do we do 
that? Any ideas? 



b. 




This month s special: CLP TLC 



qU ^sSfeeder raises 
optional sheet d 

Macintosh erw 

n 1 800-752-0900, 

So ca Si ac i or your nearest 
Ex t.7l4Gfory Youll 

authorized Hfoea 



any way you slice i 



any way you 

There isabetter way 

gSSSS 



„ deaDER SERVICE CARD 
CIRCLE 491 ON REAL* 



Power Programming 



The multiplication algorithm shown 
here differs slightly from the longhand 
technique you probably learned in school, 
in that the partial products are accumulated 



on the fly. When you perform long multi- 
plication with pencil and paper, you nor- 
mally find all the partial products first, 
then add them all up at the end of the calcu- 
lation. 

One particularly nice feature of this ver- 
sion of the multiplication algorithm is that 
it lets you use your CPU's native hardware 



multiply, if one is available. You need 
only pick a base such that the square of the 
base is less than or equal to 1 plus the larg- 
est product that can be generated by the 
CPU's unsigned multiply instruction. If no 
hardware multiply is available, of course, 
you simply fall back on base 2, in which 
case the algorithm degenerates into the 



MPNEG.ASM 



title MPNEG.ASM Multiple-Precision 2's Complement 
page 55,132 

MPNEG.ASM Multiple-Precision 2's complement Routine 

for Intel 8086, 8088, 80286, and 
B0386 in real mode/16-bit protected mode 

Copyright (C) 1989 Ziff Communications Co. 
PC Magazine * Ray Duncan 



Call with: 



DS;SI = address of argument 

CX = argument length in bytes 

Assumes direction flag is clear at entry 



Returns: ES:DI = address of result 

Destroys: Nothing 
TEXT segment word public 'CODE' 
assume cs; TEXT 



mpneg 



public mpneg 
proc near 



push 
push 

mpneg 1: not 
inc 
loop 

pop 
mov 
stc 

mpneg 2: adc 
inc 
loop 

pop 
mov 
ret 

mpneg endp 
TEXT ends 



byte ptr [si] 
si 

mpneg 1 
cx 

si,di 

byte ptr [si ] , 
si 

mpneg2 
cx 

si,di 



COMPLETE LISTING 



; save address of result 

; save two copies of 

; argument length 

; l's complement this digit 

; advance through argument 

; until all digits inverted 

; retrieve length of argument 

; retrieve f irst-byte-addresB 

; set carry to add 1 

; add 1 to l's complement 

; to get 2's complement 

; until all digits finished 

; restore operand length 

; restore argument address 

; back to caller 



Figure 4: MPNEG.ASM is a general-purpose two's complement routine that changes the sign of multiple-precision integers. 



MPADD.ASM 



title 
page 

MP ADD. ASM 



MP ADD . ASM Multiple-Precision Integer Addition 
55,132 

Multiple-Precision Integer Addition 

for Intel 8086, 8088, 80286, and 

80386 in real mode/16-bit protected mode 



Copyright (C) 1989 Ziff Communications Co. 
PC Magazine • Ray Duncan 



Call with: 



Returns : 
Destroys : 



DS:SI - address of source operand 
ES:DI = address of destination operand 
CX = operand length in bytes 

Assumes direction flag is clear at entry 

ES:DI = address of result 

al, cx, si (other registers preserved) 



_TEXT segment Word public 'CODE' 



mpadd 



assume cs: TEXT 



public mpadd 
proc near 



push 
clc 



mpadd 1: lodsb 
adc 
inc 
loop 

pop 
ret 



byte ptr es:[di],al 
di 

mpadd 1 



COMPLETE LISTING 



; save address of result 
; carry initially clear 



; next byte from source 
; accumulate sum 



until all bytes processed 



restore address of result 
back to caller 




Figure 5: MPADD.ASM is a general-purpose addition routine for multiple-precision integers. 



MPSUB.ASM 



title 
page 

MPSUB . ASM 



MPSUB.ASM Multiple-Precision Integer Subtraction 
55,132 

Multiple-Precision Integer Subtraction 

for Intel 8086, 8088, 80286, and 

80386 in real mode/16-bit protected mode 



Copyright (C) 1989 2iff Communications Co. 
PC Magazine * Ray Duncan 



Call with: 



Returns : 
Destroys : 



DS:SI ■ address of source operand 
ES:DI = address of destination operand 
CX ■ operand length in bytes 

Assumes direction flag is clear at entry 

ES:DI = address of result (destination - source) 

AL, cx. Si (other registers preserved) 



_text segment word public 'CODE' 



mpsub 



rapsub 
TEXT 



assume 


cs :_TEXT 


public 


mpsub 


proc 


near 


push 


di 


clc 




lodsb 




sbb 


byte ptr 


inc 


dl 


loop 


mpsub 1 


pop 


di 


ret 




endp 




ends 




end 





COMPLETE LISTING 



; save address of result 

; carry initially clear 

; next byte from source 

; subtract from destination 

; until all bytes processed 

; restore address of result 

; back to caller 



Figure 6: MPSUB.ASM is a general-purpose subtraction routine for multiple-precision integers. 
ESI PC MAGAZINE NOVEMBER 28, 1989 



MPMUL1.ASM 



title MPHULl.ASM Multiple-Precision unsigned Multiply 
page 55, 132 

MPMUL1.ASM Multiple-Precision unsigned Multiply 
for Intel 8086, 8088, 80286, and 
80386 in real mode/16-bit protected mode 



COMPLETE LISTING 



Call with: 



Copyright (C) 1989 Ziff Davis Communications 
PC Magazine * Ray Duncan 

DS:Si = address of source operand 
ES:DI = addresB of destination operand 
CX = operand length in bytes 

Assumes direction flag is clear at entry 
Assumes DS = ES <> SS 
Assumes cx <= 255 

Returns: ES:DI = address of product 

NOTE: Buffer for destination operand must be 
twice as long as the actual operand, because 
it will receive a double-precision result. 

Destroys: AX (other registers preserved) 

Usage: DS:SZ = u[0] base address source operand 

SS:BP = v[0j base address destination operand 

ES:DI = w[0] base address of product 

BX = l index for outer loop 

cx = j index for inner loop 

DH = m operand length in bytes 

DL = k remainder of partial products 

:EXT segment word public ' CODE * 

assume ce:_TEXT 

public mpmull 
proc near 



mpmull 



piish 

push 

push 

sub 

mov 

mov 

push 

push 

push 

push 

push 

pop 

mov 

mov 

rep 



bx 
dx 
bp 

Bp, CX 

bp.sp 
dh,cl 

cx 



si, di 
di ,bp 
movsb 



save registers 

make buffer on stack 
for destination operand 
save operand length <m) 



copy destination operand 
to temporary storage in 
stack frame, because result 
will be built in destination 
operand's buffer 





pop 


es 






pop 


di 






pop 


si 






pop 


cx 






push 


di 


; initialize destination buffer 




xor 


ax, ax 


; to receive result (it better be 




rep 


stosw 


; twice the s ize of the operands ) 




pop 


di 






xor 


bx.bx 


; l - 


mpmulll: 


xor 


dl,dl 


; k = 




xor 


cx,cx 


; j = 


mpmul!2 : 


xchg 


bx, cx 


; get u[j] 




mov 


al, tsi+bx ] 




xchg 


bx,cx 






xchg 


bp,di 






mov 


ah,ss: [di+bx] 


; get v[i] 




xchg 


bp,di 






ami 


ah 


; t = u[ j] * v[i] 




add 


al,dl 


; + k 




adc 


ah,0 






add 


bx,cx 






add 


al , [bx+di ] 


; * w[ i+j j 




adc 


ah, 






mov 


[bx+di] ,al 


; w( i + j J ■ t mod b 




mov 


dl,ati 


; k = t / b 




sub 


bx , cx 


; restore i 




inc 


cx 


; j ++ 




crop 


cl ( dh 


; j = m? 




jne 


mpmull2 


; no, repeat inner loop 




push 


bx 






add 


bl,dh 


; w[i+m] = k 




adc 


bh,0 






mov 


[di+bx ] ,ah 






POP 


bx 






inc 


bx 


; i + + 




cmp 


bl.dh 


; i = m? 




jne 


mpmu 111 


; no, repeat outer loop 



add 
pop 
pop 
pop 
ret 

mpmull endp 
_text ends 
end 



sp.bx 
bp 
dx 
bx 



discard operand buffer 
restore registers 



back to caller 




Figure 7: MPMUL1 .ASM is a general-purpose unsigned multiplication routine for multiple-precision integers. 



MPIMUL.ASM 



COMPLETE LISTING 



title 
page 

MPIMUL.ASM 



Copyright (Cj 
PC Magazine * 



MPIMUL.ASM Multiple-Precision signed Multiply 
55, 132 

Multiple-Precision signed Multiply 
for Intel 8086, 8088, 80286, and 
80386 in real mode/16-bit protected mode. 
Requires MPNEG . ASM (multiple-precision 
2's complement) and MPHULl.ASM (multiple- 
precision unsigned integer multiply). 

1989 ziff Davis Communications 
Ray Duncan 

DS:SI = address of source operand 
ES:DI = address of destination operand 
CX = operand length in bytes 

Assumes direction flag is clear at entry 
Assumes DS = ES <> SS 
Assumes CX <= 255 

ES:DI « address of product 

NOTE: Buffer for destination operand must be 
twice as long as the actual operand, because 
it will receive a double-precision result. 



Destroys : 
TEXT segment 



extrn 
extrn 



AX (other registers preserved) 
word public 'CODE' 



mpmull : near 
mpneg : near 



assume cs : TEXT 



public 
mpimul proc 



mpimul 
near 



mpiml : 



mpim2 : 




mpim3 : 

mpimul endp 
text ends 



mov 


bx,cx 


; take Exclusive-OR of 


mov 


al, { si+bx-1 ] 


; signs of operands 


xor 


al, (di+bx-1] 


pushf 




; save sign of result 


test 


byte ptr [ si+bx-1 J , 60h 


; source operand negative? 


jz 


mpiml 


; no, jump 


push 


di 


; yes, flip sign of 


call 


mpneg 


; source operand 


pop 


di 




test 


byte ptr [di+bx-1 ] ,80h 


; destination operand negative? 


u 


mpim2 


; no, jump 


push 


si 


; yes, flip sign of 


mov 


si,di 


; destination operand 


call 


mpneg 




pop 


si 




call 


mpmull 


; perform unsigned multiply 


popf 




; retrieve sign of result 


jns 


mpim3 


; jump, result is positive 


push 


si 


; operand signs were not 


push 


cx 


; same , make result negative 


mov 


si,di 




shl 


cx, 1 




call 


mpneg 




pop 


cx 




pop 


si 




pop 


bx 


; restore register 


ret 




; back to caller 



Figure 8: MPIMUL.ASM is a general-purpose signed multiplication routine for multiple-precision integers. 
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more familiar shift-and-add method of 
multiplication. 

MULTIPLE-PRECISION ROUTINES 

Figures 4 through 8 are the source listings 
of the first multiple-precision integer arith- 
metic modules in our Power Programming 
library, which are as follows: 

MPNEG Change sign of a multiple- 
precision integer 

MP ADD Multiple-precision integer 
addition 

MPSUB Multiple-precision integer 

subtraction 
MPMULl Multiple-precision 

unsigned integer 

multiplication 
MPIMUL Multiple-precision signed 

integer multiplication 

The logic of the addition, subtraction, 
and multiplication routines follows the 
flow of the pseudo-code listings quite 
closely. The change-sign procedure em- 
ploys the familiar trick of taking the one's 
complement of the entire integer, ther 
adding 1 . 

The calling sequence for these various 
multiple-precision routines is documented 
in their source-code listings. In general, 
CX is used to pass the length of the argu- 
ments, which are assumed always to be the 
same size. DS:SI points to one argument 
and ES:DI points to the other. The DS:SI 
argument is referred to as the source and 
the ES:DI argument as the destination, 
which preserves symmetry with operand 
usage in the CPU's native ADD and SUB 
instructions. 

The result of the operation always re- 
places the destination argument, and the 
address of the result is returned in ES:DI. 
One warning is in order: when calling the 
multiple-precision multiply routines, you 
must make certain that the buffer that holds 
the destination argument is twice as large 
as the argument itself so that it will be able 
to hold the product of the two arguments. 

THE IN-BOX 

Please send your questions, comments, 
and suggestions to me at any of the follow- 
ing e-mail addresses: 
PCMagNet: 72241,52 
MCI Mail: rduncan 

BIX: rduncan ■ 
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Arithmetic Routines 
For Your Computer 
Programs, Part 3 



The classical arithmetic algorithms that 
underlie the longhand procedures we all 
use for integer addition, subtraction, mul- 
tiplication, and division have been well 
understood for hundreds of years. Indeed, 
the term algorithm originally referred only 
to the formalized procedures for these 
arithmetic operations and is actually a cor- 
ruption of the name of renowned Arab 
mathematician al-Khwarizml. 

The classical algorithms are important 
not only to schoolchildren but to program- 
mers and computer designers as well. The 
algorithms are the foundation for hardware 
adders, multipliers, and dividers, and for 
the design of software routines that can 
carry out arithmetic operations that are not 
supported in hardware. 

Schoolchildren are typically taught the 
how without the why when it comes to ba- 
sic arithmetic. I found it quite enlightening 
to look closely at the classical algorithms 
(particularly for multiplication and divi- 
sion) and to realize the extent to which I 
had been basing these longhand proce- 
dures on faith rather than understanding. 

The aspiring programmer's ultimate re- 
source on the classical algorithms (and on 
a mind-boggling assortment of other topics 
as well) is Donald Knuth's The Art of 
Computer Programming. Knuth combines 
a gift for clear writing with a depth of 
mathematical insight and a breadth of 
knowledge and experience that have few 
parallels in these days of superspecialized 
professors. Dr. Knuth needs no favorable 
reviews from me, of course; although his 
three volumes (so far) may at first appear 
intimidating, it is a truism to say that they 
should be on the bookshelves of all but the 
most casual programmer. 

Knuth's discussion of the classical al- 
gorithms appears on pages 229-45 of vol- 
ume 2, Seminumerical Algorithms. Unfor- 
tunately, however, his program examples 
are rendered in MIX, the assembly lan- 
guage of a hypothetical CPU for which 
simulators exist only in the halls of aca- 
deme. Accordingly, in the last installment 
of this column, I presented high-level, 



■ Implementations of the 
classical algorithms for 
multiplication and 
division that you can use 
in your own programs 
round out this series on 
arithmetic operations. 



radix-independent, pseudo-C translations 
of Knuth's example routines for addition, 
subtraction, and multiplication. We then 
used this pseudo-code as a guide for the 
implementation of corresponding assem- 
bly language subroutines. 

I don't plan to take this approach for di- 
vision, however, because the classical al- 
gorithm for radix-independent division is 
rather complex and subtle. If you recall 
long division as one of the major sore 
points of your first few years of grade 
school — something that caused signifi- 
cantly more mental anguish than addition, 
subtraction, and multiplication — there is a 
good reason for it. Long division requires 
normalizations, groupings, and "trial di- 
vides" that do not reduce readily into a 
simple, easily understood piece of radix- 
independent pseudo-C code. 

Luckily, however, there is a solution 
that will suffice nicely for the purposes of 
this column. When working in binary (ra- 
dix = 2), the classical division algorithm 
degenerates to a considerably simpler form 
that we typically see implemented in a 
shift-and-subtract loop. The multiple trial 



divides that are often needed for each for- 
ward step in the generalized form of the al- 
gorithm—not to mention the logic neces- 
sary to pick trial divisors intelligently — go 
away completely in binary. Similarly, 
when used for binary multiplication, the 
classical algorithm can be simplified into a 
short and sweet shift-and-add loop. 

"Ah yes," I can almost hear you say- 
ing, "the good old shift-and-add and shift- 
and-subtract methods of multiplication 
and division." Why — even at this consid- 
erable distance — can I almost hear you 
saying this? Because of all the times I've 
muttered it to myself, of course! We all are 
familiar with these types of routines, and 
we feel instinctively that we understand 
how they work — or could understand easi- 
ly if we only bothered to try . We have day- 
to-day experience with using a left shift for 
a fast multiply by 2 and a right shift for a 
fast divide by 2. We've all taken the com- 
monly used multiply-by- 10 shortcut that 
relies on a couple of shifts and an add. 

But few of us are actually ever called 
upon to write one of these multiplication or 
division routines, and in practice they are 
not quite as ' 'obvious" as we fondly imag- 
ine. On the other hand, there is certainly 
nothing magical about such routines; they 
turn out to be quite straightforward when 
given the usual attention to detail. 

In this column, I'll provide cookbook 
methods for writing multiplication and di- 
vision routines that will serve you well on 
any reasonable CPU (the nasty CPUs that 
use ls'-complement arithmetic or lack a 
carry flag are better avoided than con- 
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quered), and then I'll illustrate these meth- 
ods with working code. 

RECIPE FOR SHIFT-AND-ADD MULTIPLY 

The following procedure assumes that you 
are multiplying two arguments (sometimes 
called the multiplier and multiplicand) that 
are the same length (in bytes) to obtain a 
product that is twice the length of either ar- 
gument. The arguments and the product 
are further assumed to be unsigned; han- 
dling arithmetic signs and checking for 
zero arguments is best done in a "shell" 
routine external to the fundamental multi- 
plication procedure. This allows routines 
that need maximum speed and that have 
control over their arguments to call the un- 
signed routine directly, achieving best per- 
formance. Lastly, it is assumed that your 
CPU has a carry flag that is under direct 
program control, and that it has both right 
and left shift instructions that work togeth- 
er with the carry flag, allowing you to re- 
move a bit from one byte and insert it in an- 
other. 

Given these assumptions, the steps in 
the recipe are as follows: 

( 1 ) Initialize the high half of the buffer that 
will receive the product to 0. (The low half 
will be discarded by shifting, so its original 
value is unimportant.) 

(2) Initialize the loop counter to 8 times the 
length of each argument (in bytes); this is 
the number of binary "digits" (bits) in the 
multiplier that must be tested. 

(3) Clear the carry flag. 

(4) Logical right-shift the buffer that con- 
tains the forming product by one bit posi- 
tion; the value that is in the carry flag be- 
comes the new most significant bit of the 
product. 

(5) Logical right-shift the buffer that con- 
tains the second argument (the multiplier) 
by one position; the "lost" bit shifted out 
is saved in the carry flag. 

(6) If the carry flag is clear (that is, if the bit 
shifted out of the multiplier was 0), go to 
step 8. 

(7) If the carry flag is set (that is, if the bit 
shifted out of the multiplier was 1 ), add the 
first argument (the multiplicand) to the 
high half of the forming product. Any 
overflow of this addition is saved in the 
carry flag. 

(8) Decrement the loop counter, preserv- 
ing the carry flag; if the loop counter is 
nonzero, go to step 4 and continue. 



MPMUL2.ASM 1 of 2 



title MPMUL2 . ASM Multiple-Precision Unsigned Multiply 
page 55,132 

MPMUL2 . ASM Multiple-Precision Unsigned Multiply 
for Intel 8086, 8088, 80286, and 
80386 in real mode/16-bit protected mode. 
This version uses "shift and add" method. 

Copyright (C) 1989 Ziff Communications Co. 
PC Magazine * Ray Duncan 



Call with: 



DS:SI ■ address of source operand 
ES:DI = address of destination operand 
cx ■ operand length in bytes 

Assumes direction flag is clear at entry 
Assumes DS * ES <> SS 
Assumes < CX <= 255 



ES:DI 



address of product 



NOTE: Buffer for destination operand must be 
twice as long as the actual operand, because 
it will receive a double-precision result. 

Destroys: AX (other registers preserved) 

TEXT segment word public 'CODE' 

assume cs: TEXT 



mpmu!2 



public mpmul2 
proc near 



push 
push 
push 
push 

push 
mov 

add 
mov 

xor 
rep 

pop 

mov 
shl 
shl 
shl 
inc 

clc 

mpmul21: pushf 
mov 
shl 
dec 
popf 



bx 
cx 
dx 
bp 

di 

dx, cx 

di , cx 
bp,di 

al,al 
stosb 

di 

cx,dx 
cx, 1 
cx, 1 
cx, 1 
cx 



bx,dx 
bx, 1 
bx 



mpmul22 : 



rcr 
dec 
jns 

jnc 



xchg 
push 
mov 
xor 



mpmu 12 3: mov 
adc 



byte ptr es:[di+bx],l 
bx 

mpmu 12 2 
mpmu 12 4 



bp,di 
cx 

cx,dx 
bx,bx 

al , [ si+bx ] 
es : [di+bx ) , al 



save registers 



save addr of dest argument 
save bytes/operand 

find address of high half 
of product, save it in BP 

initialize high half of 
forming product to zero 

retrieve addr of dest arg 

CX = bits per argument + 1 



initialize carry 

save carry flag 

BX = bytes in product 



; restore carry flag 

; shift forming product and 

; dest operand right 1 bit 

; loop while BX >= 

; jump if bit shifted out = 

; bit shifted out = 1 

; DI " high half of product 

; save bit counter 

; CX = bytes per argument 

; init index (also clears carry) 

; add source argument to high 

; half of forming product 



Figure 1 : A general-purpose unsigned multiplication routine for multiple-precision integers. This 
version uses a binary shift-and-add approach that does not exploit the CPU's native hardware 
multiply. Compare with the MPMUL1 .ASM listing published in our previous issue, which carries out 
multiplication in a byte-wise fashion using the CPU's MUL instruction. 
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Standard 
Features 


DELL 




NORTHGATE 




Processor 


12 5 MHz 8028b 




12 MHz 80286 




Memory 


5I2K 




One Megabyte 




Video Interface 


16 Bit Built-in 

IFactory Fixablel Only 




16 Bit Add-on 
lOn-Site Fixablel 




Display 


12" VGA Mono .31 DP 




12 VGA Mono .31 DP 




Floppy Drive 


One - 1.2 or 1.44 




One - 1 2 or 1.44 




Std. Hard Drive 
Capacity 


20 Mbyte 




32 Mbyte RLL 




Hard Drive Type 


IDE Built-in 




Can use - RLL. MFM. 
IDE. ESDI 




I/O Capabilities 


2 Set. 1 PP 




2 Ser. 1 PP 




Keyboard 


"Mushy Touch" 101 




Famous OmniKey/102 




Software 
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On-Une Help. MS-DOS 




Space Saver Case 


I5"W. 1 5 L. 4"H 




I4.5"W. 16.5 L. 5.5 H 




Moneyback 
Period 


30 Days 




30 Days 




Warranty 


1 Yr. Parts 6 Labor 




1 Yr. Parts & Labor 




Phone Tech 
Support 


Unlimited. Toll Free 




Unlimited, Toll Free 




Hours Open - 
Sales 


Standard Daytime. Eve 




24 Hours 

All Day Every Day 




Hours Open - 
Tech 


Standard Daytime 




24 Hours 

All Day Every Day 




Total 


$1,699 




SI. 699 




SCORE 


DELL 




NORTHGATE 





And the winner is... 
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To understand what's going on here, 
just think back to the longhand technique 
for multiplying decimal numbers. Each 
digit of the multiplicand is multiplied by 
each of the digits of the multiplier to obtain 
a set of partial products. After appropriate 
shifting, the partial products are added to- 
gether to form the final product. 

In binary multiplication, each "digit" 
of the multiplier can only be a or a 1, so 
each "partial product" that needs to be ac- 
cumulated is either or the appropriately 
shifted value of the multiplicand. The rest 
is just trickery to make everything end up 
in the correct position. 

SHIFT-AND-SUBTRACT DIVIDE 

In the next procedure, the assumption is 
that you are dividing an unsigned dividend 
by an unsigned divisor to get an unsigned 
quotient and an unsigned remainder. The 
dividend is further assumed to be twice the 
length (in bytes) of the divisor; both re- 



Handling signs, zero 
divisors, and other odd 
conditions outside the 
core unsigned division 
routine has advantages. 



mainder and quotient are the same length 
as the divisor. 

Again, signs, zero divisors, overflow, 
and other odd conditions should be han- 
dled outside the core unsigned division 
routine; this allows routines that require 
maximum speed and that have control over 
their arguments to call the unsigned rou- 
tine directly. Finally, it is assumed that the 
characteristics (shifts and carry-flag con- 
trol) demanded of the CPU for the shift- 
and-add multiplication routine also apply 
for the shift-and-subtract divide routine. 
The steps in the recipe become: 
( 1 ) Set the loop counter to the value that is 
8 times the length of the divisor (in bytes); 
this is the number of bits of quotient and re- 
mainder that need to be generated. The ini- 
tial value in the buffer that will receive the 
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inc 


bx 




loop 


mpmu 12 3 




pop 


cx 




xchg 


bp,di 


rnpmu 12 4: 


loop 


mpmu 121 




pop 


bp 




pop 


dx 




pop 


cx 




pop 


bx 




ret 




mpmul2 


endp 




TEXT 


ends 






end 





; restore bit counter 

; restore dest operand pointer 

; loop until all bits processed 

; restore registers 

; back to caller 
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title MPDIV.ASM Multiple-Precision Unsigned Divide 
page 55,132 

MPDIV.ASM Multiple-Precision Unsigned Divide 

using -shift-and-subtract" method 
for Intel 8086, 8088, 80286, and 
80386 in real mode/16-bit protected mode. 

copyright (C) 1989 ziff Communications Co. 
PC Magazine • Ray Duncan 

Call with: DS:SI = address of divisor 
ES:DI = address of dividend 
CX = divisor length in bytes 

(dividend length = 2 * divisor length) 

Assumes direction flag is clear at entry 
Assumes DS = ES <> ss 
Assumes < CX <= 255 

Returns: ES:DI = address of quotient 

DS:SI = address of remainder 

NOTE : Dividend is assumed to be twice as long 
as the divisor. Returned remainder and quotient 
are same size as divisor. 

AX (other registers preserved) 



Destroys 
TEXT 



mpdiv 



segment 


word public 'CODE' 




assume 


cs:_TEXT 




public 


mpdiv 




proc 


near 




push 


bx 


; save registers 


push 


cx 




push 


dx 




push 


si 




push 


di 




push 


bp 




mov 


dx,cx 


; save divisor length in DX 


mov 


bp, cx 


; BP will be outer loop 


shl 


bp,l 


; counter, set it to number 


shl 


bp, 1 


; of bits in divisor 


shl 


bp, 1 




clc 




; initially clear carry 



: . : ■ 



Figure 2: A general-purpose unsigned division routine for multiple-precision integers. It carries out 
the operation in binary using a shift-and-subtract loop. 
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We think of it as, well, enginearing. 

It works like this. You tell us what you need 
in your printers. We listen. And then our engineers 
design ALPS printers to do everything you'll 
ever need printers to do. 

For instance, for those who take their printers 
personally, ALPS offers personal dot matrix 
printers with jam-proof, flatbed designs and 
loads of paper handling features. Like our 24-pin 
Allegro 24, which received the PC MAGAZINE 
EDITOR'S CHOICE award because, in their words, 
"The value of the entire package is unmatched'.' 



For the desktop publishers of the world, 
there's the first in our family of page printers, the 
compact, lightweight ALPS LPX600. Its per- 
formance moved PC WORLD to write, "The print 
quality is actually superior to the LaserJet 
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For everybody else, our workhorse ALQe 
and "P" Series dot matrix printers can do 
everything else. From letters to color graphics to 
high volume financials. 
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MPDIV.ASM 



mpdivl : 


push 
mov 


di 

cx, dx 


f 


save pointer to dividend 
CX = bytes in dividend 


mpdiv2 : 


rcl 
inc 
inc 
loop 


word ptr [di],l 

di 

di 

mpdiv 2 


1 
f 

7 


shift carry flag into 
low bit of quotient 
shift high bit of dividend 
into carry flag 




pop 


di 


/ 


restore pointer to dividend 




jnc 


mpdiv5 


/ 


jump if high bit was clear 


mpdiv3 : 


push 
push 


si 
di 


t 


save pointer to divisor 
save pointer to dividend 




add 
mov 


di,dx 
cx,dx 


> 


DI = addr high half of dividend 
CX = bytes in divisor 

initially r>l r rru 
x. iiJ. u j~ol j. i. y ^lcai l.qi i j 


mpdiv4 : 


mov 
sbb 

inc 
loop 


al, [si] 
[di],al 

di 

mpdiv4 


i 
i 


subtract divisor from high 
half of dividend 




pop 
pop 


di 
si 


( 


restore pointer to dividend 
restore pointer to divisor 




stc 
dec 

jmp 


bp 

mpdiv 7 


; 

; 


shift bit=l into quotient 
all bits of answer generated? 

nn 1 nnn 

lt\J f J. kJ^-J LJ 

yes, go clean up and exit 


mpdiv5 : 


push 
push 


si 
di 




save pointer to divisor 
save pointer to dividend 




add 
mov 


di , dx 
cx,dx 


'; 


point to high half of dividend 
CX = bytes in divisor 
initially clear carry 


mpdiv6 : 


mov 
sbb 

inc 
loop 


al, [di] 
al, [si] 

di 

mpdiv6 


>• 


high half of dividend > divisor? 




pop 
pop 


di 
si 


i 
7 


q c n ro n^i ntor t~\ H i ui HonH 
IcsLUic ^lUlilLci L l_> UXVlUcllU 

restore pointer to divisor 




jnc 


mpdiv3 


i 


jump, high dividend > divisor 




clc 
dec 
jnz 


bp 

mpdivl 


t 


shift bit=0 into quotient 
all bits of answer generated? 
no, loop again 


mpdiv7 : 


mov 


cx,dx 


i 


CX = bytes in quotient 


mpdiv8: 


rcl 
inc 
loop 


byte ptr [di],l 
di 

mpdiv8 


t 


bring final bit into quotient 




xchg si,di 
mov cx,dx 
rep movsb 


9 


copy remainder to final address 




pop 
pop 
pop 
pop 
pop 
pop 
ret 


bp 
di 
si 
dx 
cx 
bx 


1 
# 


restore registers 
back to caller 


mpdiv 


endp 








_TEXT 


ends 
end 
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When you're working, there's only one 
thing worse than being interrupted. Being 
interrupted by your own PC. But when you stop 
to format a disk, it makes you stare at a blank 
screen for about 2 minutes. Format the whole 
box, and you lose close to 20 minutes. And if you 
ever have to format a data cartridge tape, your 
machine is out of action for as much as 2 hours. 

But if you use our formatted disks and 
cartridge tapes, you'll never be interrupted 
again. The disks come in 3.5" and SW'sizes. 
For the IBM PC, PS/2® and compatibles. Our 
data cartridge tapes come in 10 to 150 Mbyte 
versions. All you do is take them out of the box 

For more information about formatted diskettes and data cartridge tapes, call 1-800-888-1889, ext. 45 
"IBM" and "PS/2" are registered trademarks of the IBM Corporation. 



and keep right on working. Of course, it helps to 
buy them first. So next time you need diskettes 
or data cartridge tapes, be sure to ask for 3M 
formatted products. 

OK, we've taken enough of your time. 
Now you can get back 
to work. 




Come see us at Comdex, Booth #1742. 
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quotient is unimportant because it will be 
discarded by shifting during the proce- 
dure. 

(2) Clear the carry flag. 

(3) Left-shift the quotient by one bit posi- 
tion; the previous value of the carry flag is 
inserted into the quotient as the new least 
significant bit. 

(4) Left-shift the dividend by one bit posi- 
tion; the bit shifted out is saved in the carry 
flag. 

(5) If the carry flag is clear, go to step 7. 

(6) Subtract the divisor from the upper half 



Shifting is a shortcut 
to inspecting groups 
of the dividend's 
digits equal in length 
to the divisor. 



of the dividend. Set the carry flag and go to 
step 8. 

(7) If the upper half of the dividend is larg- 
er than the divisor, go to step 6; otherwise, 
clear the carry flag and go to step 8. 

(8) Decrement the loop counter, preserv- 
ing the state of the carry flag; if the loop 
counter is nonzero, go to step 3. 

(9) Left-shift the quotient by one bit posi- 
tion, bringing the carry flag into the quo- 
tient as the final least significant bit. (Mov- 
ing this last shift outside the main loop is 
not really necessary, but it allows the use 
of a slightly more efficient control struc- 
ture.) The remainder is whatever is left in 
the high half of the dividend. 

Again, when attempting to understand 
what is going on in this procedure, it is 
helpful to draw analogies to longhand dec- 
imal division. The important distinction, 
however, is that trial divides are not neces- 
sary when we choose to view each bit as a 
single digit; either the divisor can fit into 
the portion of the dividend we are looking 
at or it can't. We use shifting as a conve- 
nient shortcut to inspecting groups of the 
dividend's digits that are the same length 
as the divisor. The rest is just bookkeeping 
and positioning of the results. 



MPIDIV.ASM 



title 
page 

; MPIDIV.ASM 



; Copyright (C) 
; PC Magazine * 

: Call with: 



MPIDIV.ASM Multiple-Precision Signed Divide 
55, 132 

Multiple-Precision Signed Division 
for Intel 8086, 8088, 80286, and 
80386 in real mode/16-bit protected mode. 
Requires MPKEG . ASM (multiple-precision 
2's complement) and MPDIV.ASM (multiple- 
precision unsigned integer divide). 

1989 Ziff Communications Co. 
Ray Duncan 

DS:SI = address of divisor 

ES:DI = address of dividend 

CX = divisor length in bytes 

(dividend length «• 2 * divisor length) 

Assumes direction flag is clear at entry 
Assumes DS » ES <> ss 
Assumes < CX <= 255 



Returns : 



ES:DI 
DS:SI 



address of quotient 
address of remainder 



; Destroys: 



NOTE: Dividend is assumed to be twice as long 
as the divisor. Returned remainder and quotient 
are same size as divisor. 

The sign of the quotient is positive if the signs 
signs of the dividend and divisor are the same; 
negative if they are different. The sign of the 
remainder is the same as the sign of the dividend. 

AX (other registers preserved) 



_TEXT 


segment 


word public 'CODE' 






extrn 


mpdiv : near 








mpneg : near 






assume 


cs: TEXT 






public 


mpidiv 




mpidiv 


proc 


near 






push 


bx 


; save registers 




mov 


bx,cx 


; get Exclusive-OR of 




mov 


al, [si+bx-1] 


; signs of operands 




add 


bx,bx 






xor 


al, [di+bx-1] 






pushf 




; save sign of result 




mov 


al, (di+bx-1) 


; test sign of dividend 




or 


al,al 




pushf 




; save sign of remainder 




jns 


mpidl 


; jump if dividend positive 




push 


si 


; save pointer to divisor 




push 


cx 


; save length of divisor 




mov 


si,di 


; point to dividend 




add 


cx, cx 


; calc length of dividend 




call 


mpneg 


; flip sign of dividend 




pop 


cx 


; restore length of divisor 




pop 


si 


; restore address of divisor 


mpidl : 


mov 


bx,cx 


; check if divisor negative 




test 


byte ptr ( si+bx-1 ), 80h 






jz 


mpid2 


; jump, divisor is positive 




push 


di 


; save pointer to dividend 




call 


mpneg 


; flip sign of divisor 




pop 


di 


; restore pointer to dividend 


mpid2 : 


call 


mpdiv 


; perform unsigned divide 




popf 




; retrieve sign of remainder 




jns 


mpid3 


; jump, remainder is positive 



Figure 3: A general-purpose signed division routine for multiple-precision integers. This routine 
requires MPDIV.ASM (Figure 2) and MPNEG.ASM. 
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Every 
500 years 
or so, 

an exceptional 

printer 
comes along. 



Introducing the new 

gives the IBM LaserPrinter a dramatically new, 
more space -efficient shape. 

Yet with all this, there's one area in which 
the competition rises above us: their price. 
We just advanced the art of laser 
printing 25%. The new IBM LaserPrinter 
gives you state-of-the-art print quality 
a full 25% faster than its main competitor. 

2 5% faster printing, advanced features and a revolutionary 




a registered trademark of International Business Machines Corporation. HP LaserJet Series II is a product of Hewlett-Packard Corporation. Canon is a registered trademark of Canon Inc. © 1989 IBM Corp. 




printer that can raise your produc- 
tivity as much as the new IBM 
LaserPrinter doesn't come along 
every day. 

Not only does its advanced design make 
it outperform the HP LaserJet Series II, which 
up till now has been the benchmark in laser 
printing. But also, its advanced design 




spare-saving footprint of the IHM Laserl'r inter, 
smaller than its main competitions. 



J 



Why other printers can't follow in our 
footsteps. The advanced design and engineer- 
ing of the IBM LaserPrinter give it a footprint 
that's 33% smaller than its main competitors. 
And that 33% gives you more usable 
workspace. 

And more 
drawer space, too. 
Because the IBM 
LaserPrinter s font 
cards are about 
the size of credit 
cards. Yet as com- 
pact as they are, 
they offer twice as 

many fonts as almost all its main competitors' 
ge, old-fashioned cartridges. 

It's only upfront there. Since the paper- 
handling options on the IBM LaserPrinter are 
stackable, you can add features, like high- 
capacity paper drawers (500 sheets and 75 
envelopes) , without sacrificing workspace. 

And not only can you 
connect the IBM LaserPrinter 
to any IBM and many IBM- 
compatible PC's, but also, 
with its printer- sharing 
option, one IBM LaserPrinter 
can serve up to 6 PC's, for 
maximum cost- efficiency. 

Another reason it's a new 
order of laser printer. 

Unlike its main competitor, the 
IBM LaserPrinter collates letterheads, second The new IBM LaserPrinter. 
sheets and envelopes? Most people would rather Suddenly, nothing else measures up. 



give that job to someone else anyway. So it's 
nice to have a printer that will do it for you 
automatically. 

Proof that less is more. The design and 
engineering of the IBM LaserPrinter are so 
advanced, it's got 60% fewer parts than the 
previous standard in laser printing. That 
means it's got 60% fewer things to loosen, 
snap or shear. 

The solution is IBM. Yet, though it's a 
smaller, sleeker unit, the IBM LaserPrinter 
comes from a company with a big tradition of 
service and support. Which is very reassuring 
when you decide to break away from an old 
standard to go with the new one for years to 
come. 

To do that as soon as possible, contact 
your IBM Authorized Dealer or IBM marketing 
representative. Find your nearest dealer by 
calling 1 800 IBM-2468, ext. 194. 







IBM LaserPrinter 


HP LaserJet Series II 


Speed 




uptolOppm 


up to 8 ppm 


Footprint 


✓ 


291 sq. in. 


432 sq. in. 


Paper-handling options 


✓ 


500 sheets, 75 envelopes 


15 envelopes* 


Collates letters/envelopes 


/ 


yes** 


no 


Plotter emulation 


/ 


standard 


optional 


Resident fonts 


✓ 


10 


6 


Font card size*** 




credit card 


"8-track" cassette 


Standard weight 




33 lbs. 


50 lbs. 


Parts*** 


✓ 


400 


1000 


Dots per inch 


300 x 300 


300 x 300 


Printer emulation 


✓ 


IBM, HP compatible 


HP compatible 


Printer engine 


IBM 


Canon 


List price 


✓ 


$2,595 


$2,695 



*HP envelope tray replaces standard paper tray "With paper-handling options ""Approximate 
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MPIDIV.ASM 2 of 2 





push 
call 
pop 


di 

mpneg 
di 


i 
/ 


save pointer to quotient 
flip sign of remainder 
restore pointer to quotient 


mpid3 ; 


popf 
jns 


mpid4 




retrieve sign of result 
jump, result is positive 




push 
mov 
call 
pop 


si 

si,di 
mpneg 
si 


§ 


save pointer to remainder 
point to quotient 
flip sign of quotient 
restore pointer to remainder 


mpid4 : 


pop 
ret 


bx 




restore register 
back to caller 


mpidiv 


endp 








_TEXT 


ends 
end 









MULTIPLE-PRECISION ROUTINES 

Figures 1 through 3, plus MPIMUL.ASM 
and MPNEG. ASM presented last time, 
contain the source code for assembly lan- 
guage procedures that illustrate what 
we've been discussing here and that round 
out our battery of multiple-precision arith- 
metic routines. The calling procedures and 
results of each routine are documented in 
the listings. 

MPMUL2. ASM, shown in Figure 1 , is 
the unsigned multiple-precision-integer 
multiplication routine that uses the shift- 
and-add technique. You may find it in- 
structive to compare this code with the 
MPMUL1 . ASM published here in the pre- 
vious issue. The latter used the CPU's na- 
tive 8-bit-by-8-bit multiply, and you may 
wish to run some timing comparisons of 
the two routines. When running bench- 
mark tests, remember that there are drastic 
differences in the cost of a hardware multi- 
ply as you progress from the 8086/88 to the 
80386 and 80486. 

MPIMUL.ASM is the signed multiple- 
precision multiply routine. It checks the 
signs of the arguments to determine the 
sign of the eventual result, changes argu- 
ments from negative to positive if neces- 
sary (using MPNEG. ASM), then calls 
MPMUL2. ASM to do the hard work. 

MPDIV.ASM, shown in Figure 2, is 
the unsigned multiple-precision divide 
routine that implements the shift-and-sub- 
tract technique described earlier. If you're 
feeling spunky, read Knuth (volume 2, 
pages 237-38) and code a new version of 
this routine that exploits your CPU's na- 
tive DIV instruction. 
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MPIDIV.ASM, shown in Figure 3, is 
the signed multiple-precision divide rou- 
tine that checks and changes signs of argu- 
ments and results, much in the same way 
as MPIMUL.ASM. It calls MPNEG 
.ASM and MPDIV.ASM. Note that calls 
to MPIDIV.ASM should be avoided if you 
know that the sign of your arguments and 
results is not important (for example, when 
manipulating addresses), since MPIDIV is 
slower than MPDIV. 

I've tried to make these routines reason- 
ably efficient, though to keep them from 
diverging too far from the recipes present- 
ed above, I have forgone a number of opti- 
mizations that I would use in a production 
program. Once you're sure you under- 
stand the code, you can entertain yourself 
for hours by tuning it up further. Just be- 
ware of introducing machine instructions 
that affect the carry flag! 

I've also written two interactive dem- 
onstration programs, TRYMPMUL.ASM 
and TRYMPDIV. ASM, that will facilitate 
your experiments. These programs prompt 
you for arguments, call the appropriate 
multiply or divide routine, then display the 
results. Because of their length, TRYMP- 
MUL and TRYMPDIV are not printed 
here, but both are available for download- 
ing from PC MagNet. 

THE IN-BOX 

Please send your questions, comments, 
and suggestions to me at any of the follow- 
ing e-mail addresses: 
PC MagNet: 72241,52 
MCI Mail: rduncan 

BIX: rduncan ■ 
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Power 

Programming 

Arithmetic Routines 
for Your Computer 
Programs, Part 4 



by 

Ray Duncan 



In the first three columns in this series on 
computer arithmetic we looked at the clas- 
sical algorithms for the four basic integer 
operations. We also devised a set of as- 
sembly language subroutines for multiple- 
precision integer addition, subtraction, 
multiplication, and division. It's time now 
to venture forth from the relatively safe 
shallows of integer arithmetic into the 
treacherous depths of real numbers and 
floating-point arithmetic. 

A grade school student is introduced to 
the real numbers in stages. First, he gets 
the "counting numbers" (positive inte- 
gers). Next, the negative numbers are add- 
ed, completing the set of all integers. Fi- 
nally, he comes to fractions and their 
decimal equivalents. To make these con- 
cepts easier to visualize, the teacher often 
uses a "number line": a horizontal line 
with arrows at both ends (pointing to nega- 
tive infinity on the left and positive infinity 
on the right) and with a hashmark (signify- 
ing zero) in the middle. Once the integers 
( ■ • • -3, -2, -1, 0, 1, 2, 3 . . . ) are 
charted on the number line, it's a fairly 
easy jump to the notion that there are 
"numbers between the numbers." The 
fraction ¥2 is symbolized by putting a dot 
on the number line halfway between the 
and the 1 , for example. From there it's but 
another step to the realization that there are 
an infinite number of numbers between 
any two arbitrary points on the number 
line, the whole comprising the set of real 
numbers. 

At some point in high school — if he 
chooses algebra, chemistry, and physics 
over machine shop and varsity athle- 
tics — our model student is taught "scien- 
tific" or "exponential" notation. This is a 
tool that allows him to write down real 
numbers of any desired size or precision. 
For example, the fraction 'A can be ex- 
pressed in scientific notation as 2.5*10"'. 

The "2.5" portion of the notation is 
called the mantissa or fraction or signifi- 
cand. It has one nonzero digit to the left of 
he decimal point, and the number of digits 
after the decimal point indicates the degree 



■ Floating-point numbers 
present problems not 
encountered when 
dealing only with 
integers, including the 
question of how to store 
them in memory. 



of precision to which the number' s value is 
known. (A mantissa in this form is said to 
be "normalized.") The "1CT 1 " portion is 
called the exponent or characteristic; it 
specifies the location of the decimal point 
in the number. Teachers have a whole 
cookbook full of rules for operations on 
numbers written in scientific notation, 
such as: "To find the product of two num- 
bers, multiply the mantissas and sum the 
powers of the exponents . ' ' 

With such mastery of the real numbers 
and the means to manipulate them in hand, 
our student, now a regular whiz kid, may 
be tempted to adopt a somewhat smug atti- 
tude toward matters mathematical. Never 
mind; his prematurely optimistic outlook 
will be demolished when he's confronted 
with imaginary numbers, irrational num- 
bers, complex numbers, infinitesimals, 
and all the other counterintuitive mathe- 
matical things that go bump in the night. 

FLOATING POINT ON COMPUTERS 

While the methods we use to manipulate 
floating-point numbers on computers are 
certainly based on the fundamental rules 
and algorithms we were all taught way 



back when, there are several important dif- 
ferences we must bear in mind. 

For one thing, most computers and 
high-level language libraries support only 
a limited number of floating-point formats 
(typically only two), and these formats, by 
nature, have a finite precision and range. 
This means that you cannot possibly repre- 
sent every real number as a floating-point 
number on your computer. In fact, the 
number of numbers that you can't express 
is infinitely larger than the number of num- 
bers that you can represent. When your 
CPU or compiler was designed, its cre- 
ators picked a floating-point data format 
(or formats) they felt would be sufficient 
for most "normal" applications and yet 
could be implemented reasonably effi- 
ciently. If the requirements of your partic- 
ular application program fall outside the 
bounds foreseen by those designers, you 
either have to roll your own floating-point 
routines or do without. 

Not only are your computer's floating- 
point numbers a minuscule subset of the 
set of real numbers, they do not map onto 
the real numbers in a uniform way. For ex- 
ample, if you plot the numbers that can be 
represented by a 32-bit integer onto the 
real number line, you'll see a set of points 
that march monotonically along the line 
from -2,147,483,648 (2~ 31 ) to 
2,147,483,647 (2 31 -1). But if you now 
plot the numbers that can be represented by 
a 32-bit floating-point number onto the 
real number line, you'll be in for a sur- 
prise. The number of numbers that can be 
represented in the floating-point format is 
exactly the same as for the integer format 
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(think about it!). But, as illustrated in Fig- 
ure 1 , the floating-point numbers are 
densely clustered around zero, and be- 
come increasingly sparse as the distance 
from zero increases. Of course, the small- 
est and largest floating-point numbers are 
far smaller (or larger) than the smallest and 
largest integer, but this dynamic range is 
gained at the expense of precision: there 
are only so many bits to go around. 

Still a third set of new considerations 
arises from the fact that while people pre- 
fer to compute in base 10, computers find 
base 2 (binary) much more to their liking. 
As application designers and program- 
mers, we like to keep everybody happy 
("from each according to his abilities, to 
each according to his needs' ' as the famous 
coder Karl Marx enjoined in his 1875 tuto- 
rial, "Criticism of the Gotha Pro- 
gramme")- To do so, each time a number 
is input or output it must be converted from 
decimal to binary or vice versa. This 
caused no real problems when we were 
working with integers. It becomes a thorny 
issue with floating point, however, be- 
cause some apparently quite ordinary deci- 
mal numbers cannot be expressed exactly 
in binary floating point (one such number 
isLO+KT 1 ). 

BINARY FLOATING-POINT DATA FORMATS 

Once a decimal floating-point number has 
been converted to a normalized binary 
floating number, it can be thought of as 
having the form 



l.bbbbb. 



* 2' 



Where each bit b in the mantissa is a zero 
or a one. The mantissa is normalized by 
adjusting the exponent so that the most sig- 
nificant one bit is to the left of the binary 
point; in other words, the mantissa is al- 
ways greater than or equal to 1 and less 
than 2. 

How are these floating-point numbers 
actually stored in memory? The history of 
this topic is one that begins in utter chaos 
but (for once) has a happy ending. In the 
early days of computing, a thousand flow- 
ers bloomed and a thousand schools of 
thought contended, with the inevitable re- 
sult that nearly every compiler and CPU 
used a different floating-point data format. 
This made it very difficult to transport data 
from one machine to another, or even from 




HOW BITS REPRESENT NUMBERS 
IN FLOATING-POINT FORMAT 




Figure 1 : While the number of numbers that can be represented by a given number of bits is the same 
for integer and floating-point formats, in floating point the range of numbers is greater, but they occur at 
increasing intervals as they get further away from zero. 



STANDARD FLOATING-POINT FORMATS 




Single-precision 




Bits 31 30 



Double-precision 



23 22 




mantissa 



52 51 



Figure 2: The single-precision and double-precision floating-point formats specified by the 
ANSI/IEEE 754 Standard for Binary Floating-Point Arithmetic. The Standard also specifies 
"extended" single and double formats, which are not discussed here. 



a program written in one high-level lan- 
guage to one written in another. 

Fortunately, in the late 1970s, a con- 
certed effort to standardize binary floating- 
point arithmetic was begun, first under the 
auspices of the ACM, and later under the 
IEEE Computer Society. This undertaking 
drew upon several proposals, the most im- 
portant of which was the so-called KCS 
Proposal (written by Kahan, Connen, and 
Stone in 1978). These proposals, in turn, 
represented an integration of concepts and 
techniques that dated back to the earliest 
days of computer science. The IEEE com- 
mittee's work resulted in the publication in 
1981 of the draft IEEE 754 Standard for 
Binary Floating-Point Arithmetic. This 
was adopted (in a slightly modified form) 
as an official ANSI/IEEE Standard in 
1985. 



The IEEE 754 Standard was principally 
directed at making floating-point calcula- 
tions safe and predictable for programmers 
untrained in numeric analysis (that's near- 
ly all of us!). It did this by specifying, in 
great detail, the degree of accuracy to 
which computations must be carried out, 
rounding behavior, error and exception 
handling, and the results of the basic float- 
ing-point arithmetic operations, compari- 
sons, and conversions. The Standard also 
specified binary formats for floating-point 
numbers, formats that were rapidly adopt- 
ed by the industry and are now widely sup- 
ported in hardware and software. 

The two most important floating-point 
data formats described in the IEEE 754 
Standard are shown in Figure 2. The sin- 
gle-precision format occupies 32 bits (; 
double word for Intel processors). The 
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double-precision format requires 64 bits (a 
quad word for Intel CPUs). Both formats 
consist of three fields: a sign bit, which is 
always the most significant bit, followed 
by the binary exponent, with the mantissa 
in the remaining, least significant bits. Sin- 
gle-precision numbers can take on values 
in the (approximate) range ±1.18*10~ 38 
through ±3.40*10 38 ; double-precision 
numbers lie in the range ±2.23* 10 -308 to 
±1.80*10 308 . 

The sign bit is 1 if the number is nega- 
tive and if the number is positive. The 
mantissa is unsigned and does not change 
with the sign of the floating-point number. 
Because the mantissa is left-normalized, 
its most significant bit is (by definition) al- 



The 8087 provided a 
hardware implementa- 
tion of the entire 
IEEE 754 Standard. 



ways 1. Consequently, the IEEE 754 de- 
signers pulled off a neat trick: they speci- 
fied that the mantissa has an "implied 
leading bit," which is always 1 and is not 
present in the actual data. This allows an 
extra bit of precision to be squeezed out of 
each floating-point format. 

The exponent field in both types of 
floating-point numbers is "biased"; that 
is, offset from zero by a fixed amount. For 
single-precision numbers, which have an 
8-bit exponent, 127 (7Fh) in the exponent 
field corresponds to a true exponent value 
of 0. Double-precision numbers, which 
have an 1 1-bit exponent field, use an expo- 
nent bias of 1 ,023 (3FFh). This bias allows 
the reciprocal of any normalized floating- 
point number to be represented without un- 
derflow. The relative sizes of the exponent 
fields in the two formats were chosen so 
that they would allow a double-precision 
number to accommodate the product of as 
many as eight single-precision numbers 
without the possibility of overflow. 

The exponent of an IEEE 754 floating- 
point number can also take on two ' 'mag- 
ic" values, causing the number to be han- 



point number is either zero or a ' 'denorma- 
lized" number — the result of a "graceful 
underflow" (more about this in a later in- 
stallment). If all bits of the exponent are 
set, then the floating-point number repre- 
sents either infinity or a special signalling 
value: NaN (Not a Number). 

A couple of practical examples of bina- 
ry floating-point data will help clarify the 
way the standard works. Consider the 4- 
byte (32-bit), single-precision floating- 
point number 

41h 20h 00h 00h 

We see that the sign bit is 0, the biased ex- 
ponent is 10000010B or 82h, and the man- 
tissa (after restoring the "implied leading 
bit") is 

101000000000000000000000B 

or AOOOOOh. Correcting for the exponent 
bias, we have 1 .010B*2 3 , or 10 decimal. 

As another example, consider the dou- 
ble-precision floating-point number 

BFh E0h 00h 00h 00h 00h 00h 00h 

which occupies 8 bytes (64 bits). The sign 
bit is 1, the biased exponent is 
01 1 1 1 1 1 1 1 10B or 3FEh, and the mantissa 
(after inserting the "implied leading bit") 
is lOOOOOOOOOOOOOh. Correcting for the 
exponent bias, we have -1.0B*2~', or 
-0.5 decimal. 



been less influential had it not been for In- 
tel's 1980 release of the 8087 numeric co- 
processor for the 8086 and 8088 CPUs. 
The 8087 provided a hardware implemen- 
tation of the entire (draft) IEEE 754 Stan- 
dard, even down to its most esoteric as- 
pects (such as supporting two flavors of in- 
finity: affine and projective). This chip it- 
self quickly became the yardstick by which 
the compliance with the impending Stand- 
ard of all other CPUs, numeric coproces- 
sors, and software floating-point libraries 
was judged. The 8087 also brought an un- 
precedented (and largely unanticipated) 
amount of number-crunching power with- 
in the reach of every microcomputer user. 
This made it possible to migrate many de- 
manding minicomputer and mainframe 
applications onto personal computers for 
the first time. 

The 8087 was followed by the 80287 
numeric coprocessor in 1983, and by the 
80387 numeric coprocessor in 1987. De- 
signed to work with the 80286 CPU, the 
80287 was the first of the Intel coproces- 
sors to support memory protection and 
multitasking. The 80387, designed to 
work with the 80386 CPU, was enhanced 
with several powerful new trigonometric 
instructions. Each successive chip sup- 
ported a larger memory address space, and 
each benefited from the technological ad- 
vances in large-scale integration made dur- 
ing the period. These included both an in- 
crease in clock speeds and a decrease in the 
number of machine cycles required for 



SPECIAL IEEE-STANDARD EXPONENT VALUES 



Exponent bits 

all zero 
all zero 



all set 
all set 



Mantissa bits 

all zero 
nonzero 



all zero 
nonzero 



Special meaning 

floating-point zero 
denormalized floating-point 
number (usually result of 

"graceful underflow") 

infinity 

"Not a Number" or "NaN" 
(various reserved mantissa 
values are used to signal 
overflow, unrecoverable 
underflow, invalid operands, 
invalid result, inexact result, 
and so on) 



Figure 3: The IEEE 754 Standard reserves certain exponent values. Floating-point numbers with all 
bits zero or all bits set in the exponent field are trapped and receive special treatment. 
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each floating-point operation. For a typical 
floating-point instruction mix, the perfor- 
mance of the 20-MHz 80386/387 combi- 
nation is about 1 6 times better than that of a 
5-MHz 8086/8087 duo. 

The 8087, 80287, and 80387 chips are 
called "coprocessors" because they are 
closely coupled to the system's CPU, have 
a highly specialized instruction set, and 
cannot function alone. By sharing the 
same data and address bus as the main 
CPU, the coprocessors monitor the CPU's 
instruction stream as it is fetched from 
memory and flows by on the data bus. 
Floating-point instructions begin with a 
special "escape code" that is recognized 
and acted upon by the numeric coproces- 
sor; the CPU essentially ignores the float- 
ing-point instructions except to perform 
address calculations on behalf of the nu- 
meric coprocessor when they are needed. 

The 80x87 coprocessors were not the 
first floating-point arithmetic chips avail- 
able for use with microprocessors. A num- 
ber of early 8080, Z-80, and even 
8086/88-based microcomputers had sock- 
ets for the AMD 9511 and 9512 chips, 
which supported 32-bit and 64-bit float- 
ing-point operations, respectively. But the 
AMD products were not coprocessors 
(they were addressed through an 8-bit I/O 
port like a peripheral device). Moreover, 
they used nonstandard data formats, were 
slow and clumsy to program, and enjoyed 
little if no support in commercial, mass- 
market software packages. The 80x87 
chips were the first hardware number- 
crunchers that were cheap enough, and 
pervasive enough (thanks to the 8087 
socket built into the very first PC mother- 
board), to motivate mass-market software 
publishers to have their programs check 
for the presence of a numeric coprocessor 
and use the coprocessor if it was available. 

In the next installment, I'll discuss the 
architecture of the 80x87 series in more de- 
tail, and present routines that allow you to 
detect and exploit numeric coprocessors in 
your own programs. 

THE IN-BOX 

Please send your questions, comments, 
and suggestions to me at any of the follow- 
ing e-mail addresses: 
PCMagNet: 72241,52 
MCI Mail: rduncan 

BIX: rduncan ■ 



Try Co/Session and you'll 
need to find another use 
for your Carbon Copy 




Most Co/Session™ users once used other remote 
computing software. Why did they switch? In head-to- 
head evaluations, Co/Session outperformed all other 
remote computing software. 

Because of its intuitive design, even the PC novice 
can immediately utilize the power of Co/Session. And Co/Session's superior 
speed is instantly obvious to all, particularly to the PC expert. 

Compare Co/Session. If within 30 days you don't agree that Co/Session 
4-0 is easier and faster than Carbon 
Copy™ , we'll refund your money. 

For a limited time, you can 
trade up your old Carbon Copy for 
Co/Session's two-machine package 
for just $80. To order or for the 
name of your nearest dealer call 
201-855-9440. 



Co/Session is a trademark of Triton Technologies, Inc. 




Triton Technologies Inc., 200 Middlesex Tpk., Iselm, NJ 08830 
201-855-9440 Fax:201-855-9608 Available under GSA schedule 



Carbon Copy is a trademark of Microcom, Inc. 
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Please write to: PC Magazine, P.O. Box 54093, Boulder, 
CO 80322. Include your mailing label from a recent 
issue of PC Magazine for faster service. Please allow 
up to 60 days for change of address to take place. 



Why settle for a single player, 




One of the most forward-looking features 
of the original IBM PC was an undocu- 
mented, empty 40-pin socket located very 
close to the 8088 CPU on the mother- 
board. Despite IBM's lack of official com- 
ment, it didn't take long for people who 
looked at the machine's schematic to de- 
cide that this socket just must have been 
designed for Intel's new 8087 numeric co- 
processor. In due course, some brave soul 
plugged an 8087 into the mystery socket, 
turned on the juice, and breathed a sigh of 
relief when nothing in his precious $5,000 
64K PC went up in smoke as a result. 

Word spread rapidly through the still- 
minuscule community of PC users that the 
IBM PC was 8087-capable, but a couple of 
years passed before large numbers of peo- 
ple started installing 8087s. When the first 

"Cs were shipped in late 1 98 1 , 8087s were 
atill very scarce (many of the available 
chips were buggy, temperature-sensitive 
engineering samples), cost over $400 
each, and were regarded as difficult to pro- 
gram. Indeed, 8087s were considered so 
esoteric that a company called MicroWay 
built a successful business on buying the 
coprocessors in quantity from Intel and re- 
selling them, with installation instructions, 
to anxious end-users who also valued the 
company's friendly telephone support, test 
programs, and hand-coded libraries for a 
few popular compilers. 

The combination of the PC and the 
8087 brought unprecedented number- 
crunching power to desktop computers. 
By itself, the IBM PC was barely faster at 
floating-point calculations than an Apple II 
or a Z-80 machine, but the addition of an 
8087 made the PC powerful enough to 
handle many applications that had former- 
ly required mainframes or minicomputers. 
The 8087 implemented the ANSI/IEEE 
754 1981 Draft Standard for Binary Float- 
ing Point Arithmetic in a single integrated 
circuit. The standardized architecture of 
the PC/8087 combination and the poten- 

ially enormous user base also provided an 
attractive target market for microcomputer 
software developers, who had shown little 
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i Coprocessors first gave 
PCs the mathematical 
muscle of minis. Here's a 
look at how they work and 
how you can work them 
into your own programs. 



interest in previous floating-point chips, 
such as the AMD 95 1 1 and 95 12. 

The original 8087 worked with either 
the 8086 or 8088 processors and was first 
shipped in 1980. The 80287 was designed 
to work with the 80286 CPU and was the 
first of the Intel coprocessors to support 
memory protection and multitasking; it 
was shipped in 1983. The 80387 was de- 
signed to work with the 80386 CPU and 
was enhanced with several powerful new 
trigonometric instructions. After being 
brought into conformity with the AN- 



SI/IEEE 754 Standard (as it was finally ap- 
proved in 1985), the 80387 arrived on the 
scene in 1987. The 80387 is the last of its 
line — the 80486 has all of the logic of the 
80387 built in and does not require a sepa- 
rate numeric coprocessor at all. The rela- 
tive performance of the various 
80x86/80x87 combinations is summarized 
in Figure 1 . 

(Note: Throughout the remainder of 
this column, I'll use "CPU" to refer to any 
member of the Intel 80x86 processor fam- 
ily (8086, 8088, 80286, 80386, or 80486), 
and ' 'copiucessor' ' to refer to any member 
of the Intel 80x87 family (8087, 80287, or 
80387). Statements intended to apply to a 
specific member of either product family 
will include an explicit reference to a par- 
ticular model number.) 

80x87 ARCHITECTURE 

The 8087, 80287, and 80387 chips are 
called coprocessors because they are 




INTEL COPROCESSOR HISTORY 



Processor/ 
coprocessor 



8086 and 8087 
80286 and 80287 

386 and 80387 
i with embedded 80387 




Typical 


First 


Relative 


speed 


shipped 


performance 


5 MHz 


1980 


1 


8 MHz 


1983 


2.5 


20 MHz 


1987 


16 


25 MHz 


1989 


64 



Figure 1 : A comparison of the speeds, dates first available, and relative performance of the various 
combinations of Intel CPUs and their coprocessors. 
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closely coupled to the system's CPU, have 
a highly specialized instruction set, and 
cannot function alone. These coprocessors 
share the same data bus and address bus as 
the main CPU. They also monitor the 
CPU's instruction stream as it is fetched 
from memory and flows by on the data 
bus. Floating-point instructions all begin 
with a special escape code that is recog- 
nized and acted upon by the coprocessor. 
The CPU essentially ignores the floating- 



The shared instruction 
stream and the fact that 
many of the more 
complex coprocessor 
instructions can 
require hundreds of 
cycles to complete let 
careful programmers 
achieve some 
concurrent processing. 



point instructions except to perform any 
needed address calculations on behalf of 
the coprocessor. 

The shared instruction stream and the 
fact that many of the more complex co- 
processor instructions can require hun- 
dreds of cycles to complete allow the care- 
ful programmer to achieve a certain 
amount of concurrent processing. A pro- 
gram can load a floating-point number 
onto the coprocessor, issue a floating-point 
square root instruction, and then execute 
several dozen CPU instructions in the 
100+ cycles required for the square root 
operation to complete. The program can 
then resynchronize with the coprocessor 
and either unload the result of the square 
root operation into memory or use that re- 
sult as an argument for the next floating- 
point operation. 

Obviously, such concurrent program- 




INSTRUCTION CATEGORIES FOR 80X87 COPROCESSORS 



TRANSFER 



FLD, FST, FSTP 
FILD, FIST, FISTP 



Load or store floating-point value 
Load or store integer value 




FBLD, FBSTP 


Load or store binary-coded-decimal (BCD) value 


FXCH 


Exchange two coprocessor registers 


ARITHMETIC 


FADD, FADDP, FIADDP 


Floating-point add 


FSUB, FSUBP, FISUB 


Floating-point subtract 


FSUBR, FSUBRP, FISUBR 


Floating-point subtract reversed 


FMUL, FMULP, FIMUL 


Floating-point multiply 


FDIV, FDIVP, FIDIV 


Floating-point divide 


FDIVR, FDIVRP, FIDIVR 


Floating-point divide reversed 


FSQRT 


Floating-point square root 


FSCALE 


Scale one value by another 


FPREM, FPREM1 


Partial remainder (FPREM1 on 80387 only) 


FRNDINT 


Round to integer 


FXTRACT 


Extract exponent and mantissa from value 


FABS 


Absolute value 


FCHS 


Change sign of value 


COMPARISON 


FCOM, FCOMP, FCOMPP 


Compare two ordered values 


FUCOM, FUCOMP, FUCOMPP 


Compare two unordered values (80387 only) 


FTST 


Compare value to zero, set flags 


FX AM 


Test type of value, set flags 


TRANSCENDENTAL 



FPTAN, FPATAN 

FSIN, FCOS, FSINCOS 

F2XM1 

FYL2X 

FYL2XP1 



Tangent and arctangent (partial on 8087 and 80287; 
generalized on 80387) 
Sine and cosine (80387 only) 
Raise 2 to power and subtract one 
Multiply value times log 2 of another value 
Multiply value times log 2 of another value plus one 



CONSTANTS 



FLDZ 

FLD1 

FLDPI 

FLDL2T 

FLDL2E 

FLDLG2 

FLDLN2 



Load the value zero 
Load the value one 
Load the value pi 
Load the value log 2 (10) 
Load the value log 2 (e) 
Load the value log )0 (2) 
Load the value log e (2) 



PROCESSOR CONTROL 



FINIT 
FSTSW 

FLDCW, FSTCW 
FCLEX 

FLDENV, FSTENV 
FSAVE, FRSTOR 
FINCSTP, FDECSTP 
FFREE 




Initialize coprocessor 
Store status word 

(FSTSW AX form not available on 8087) 
Load or store control word 
Clear coprocessor exception (error) flags 
Load or store coprocessor environment 
Save or restore coprocessor state 
Increment or decrement coprocessor stack poin' 
Mark floationg-point register as empty 
Coprocessor "no-operation" 
Enable or disable coprocessor error interrupts 
(8087 only; ignored on 80287 and 80387) 




Figure 2: A summary of the instruction set of the 8087, 80287, and 80387 numeric coprocessors. 
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Q-TECH XT-1 2— the fastest XT on the market at a sensational 
low price! Whizzes through word processing, spread sheets 
and data bases at an incredible 12 MHZ clock speed. 51 2K 
RAM expandable to 1MB without buying a memory board. 
Includes a 514 " floppy drive with a controller for a ZYi" drive. 
Monitor and graphics card included in the low $648.88 price. 




The Q-TECH 386 SX has the power of a 386 at a 286 

price! See your business growing? With the SX, you'll be able 
to expand and take advantage of high-powered software, 
perform CAD, do desktop publishing and use it as a file server. 
Powerful processing power for today's demands-as well as 
tomorrow's! 



The Q-TECH "286" AT-compatible computers are U.S. 
made and use the speedy 80286 processor. The power of a 
286 is ideal for most small businesses and you can upgrade 
easily and inexpensively by adding additional disk drives, tape 
back-up systems, graphic adapters and modems. Dependabil- 
ity and solid features at a sensational low price. 



The Q-TECH 386-20-need the power and sophistication 
of a 386-20MHZ system? It's the natural choice as a file 
server for an LAN. Graphics-intensive applications such as 
desktop publishing and computer-aided designing need a 
386-20, as do users of large spread sheets and extensive data 
bases. The 386-20 is your best investment for the future. 



SIMPLY CALL FOR ANSWERS TO YOUR QUESTIONS AND A FREE PERSONALIZED QUOTE 



L 




Call our Computer Product Specialists to discuss f 
more than 200 custom-assembled. Q-TECH systems- 
financing for you. 

CALL FOR QUOTES ON LAPTOPS, TOO. Visa and MasterCard Now Available. 



'DOS Not Included 
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80X87 COPROCESSOR RESOURCES 



Floating-point register stack 



R0 


exp 


mantissa 


R1 


exp 


mantissa 


R2 


exp 


mantissa 


R3 


exp 


mantissa 


R4 


exp 


mantissa 


R5 


exp 


mantissa 


R6 


exp 


mantissa 




R7 I 


mantissa 


m ■■.»■■■ 


Bits 79 


64 








Control register 




Bits 15 
Tag register 



All members of the 80x87 family 
contain eight 80-bit floating-point 
registers, which may be 
addressed randomly or as a 
push-down stack; a 16-bit status 
register; 16-bit control register; 
and a 16-bit "tag" register, which 
is divided into eight 2-bit fields 
that indicate the type of value in 
each floating-point register. 



Figure 3: Special coprocessor instruction pointer and data pointer registers are also common 
resources of 80x87 chips. They're located on the CPU and used by floating-point error handlers. 




FITTING 

Try a free Microstat-II demo-pack 

and see if it isn 't a perfect fit for your statistical 
computing needs. You '11 get your work done faster, 
easier, without costly training. Microstat-II is 
easy to use - there's no complex command lan- 
guage to learn. You '11 be running Microstat-II 
in minutes rather than weeks. 

"...using Microstat-II is a breeze:' 
PC Magazine 

Microstat-II has what you need, from descriptive 
statistics to multivariate analysis. 

"Microstat-II by Ecosoft is a genuinely 
excellent menu-driven statistics pack- 
age at a moderate price:' 

Computer Language 

"Microstat-II provides you more tools 
at less than half the competition's price" 

Review Responses 

InfoWorld 

Microstat-II is up to eight times faster than other 
packages without compromising accuracy. 

". . . one of the fastest IBM PC statistical 
packages we have tested" 

InfoWorld 

"Results are unusually accurate" 

Computer Language 

Try our free Microstat-II demo and see if it can 't 
simplify your statistical workload. This free demo 
offer is good only while supplies last, so order 
yours today! 




Ecosoft Inc. 

6413 N. College Drive 
Indianapolis, Indiana 46220 
1-317-255-6476 (Info.) 
1-317-251-4604 (FAX) 

1-800-952-0472 (Orders) 
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ming requires careful scheduling of opera- 
tions and counting of cycles so that both 
processors will be kept as busy as possible 
and will not waste time waiting on each 
other. The required coding is also extreme- 
ly environment-dependent, since instruc- 
tion timings vary widely among the differ- 
ent models of CPUs and numeric 
coprocessors. 

The instructions understood by 80x87 
coprocessors fall into six basic categories: 
data transfer, arithmetic, comparison, 
transcendental, constants, and processor 
control. These are summarized in Figure 
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2. Mnemonics ending with "P" automati- 
cally pop one operand off the coproces- 
sor's floating-point register stack; mne- 
monics ending with "PP" pop two 
operands. The instruction sets of the 80287 
and 80387 are proper supersets of the 
8087; the 80287 and 80387 will run 8087 
application code without modification. 
However, considerably better perform- 
ance and more powerful programs can be 
obtained by modifying the source code to 
take advantage of the improved synchroni- 
zation characteristics and instruction sets 
of the 80287 and 80387. The generalized 
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Maybe it's the high cost of color advertising. We 
use two colors and pass the savings on to our 
customers. Would you rather get more computer for 
the money. .or be entertained with color pictures? 

Here's a fresh idea... Make Northgate and 
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30 days later send back the loser. 
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Standard 
Features 


DELL 




NORTHGATE 




Processor 


12 5 MHz 80286 




12 MHz 80286 




Memory 


5I2K 




One Megabyte 




Video Interface 


16 Bit Built-in 
(Factory Fixablel Only 




16 Bit Add-on 
(On-Site Fixable) 




Display 


12' VGA Mono 31 DP 




12" VGA Mono .31 DP 




Floppy Drive 


One - 1.2 or 1.44 




One - [.2 or 1.44 




Std. Hard Drive 
Capacity 


20 Mbyte 




32 Mbyte RLL 




Hard Drive Type 


IDE Built-in 




Can use - RLL. MFM. 
IDE ESDI 




I/O Capabilities 


2 Sen. 1 PP 




2 Ser. 1 PP 




Keyboard 


' Mushy Touch" 101 




Famous OmniKey 102 




Software 


Diagnostic 




On-Line Help. MS-DOS 




Space Saver Case 


I5"W. I5"L. 4"H 




I4.VW. I6.5"L. 5.5 "H 




Money back 
Period 


30 Days 




30 Days 




Warranty 


1 Yr. Parts & Labor 




1 Yr, Parts & Labor 




Phone Tech 
Support 


Unlimited. Toll Free 




Unlimited. Toll Free 




Hours Open - 
Sales 


Standard Daytime. Eve 




24 Hours 

All Day Every Day 




Hours Open - 
Tech 


Standard Daytime 




24 Hours 

All Day Every Day 




Total 


$1,699 




S 1.699 




SCORE 


DELL 




NORTHGATE 
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THE 80X87 COPROCESSOR STATUS REGISTER 



Busy status on 8087 and 80287, 
same as ES flag on 80387 



Condition code (status) from last 
floating-point operation or comparison 
(bit values depend on operation type) 




FEDCBA987654321 



IlllliilillB 




Invalid operation, 
exception occurred 

Denormalized operand, 
exception occurred 

Zero divisor, 
exception occurred 

Result overflow, 
exception occurred 

Result underflow, 
exception occurred 

Imprecise result, 
exception occurred 

Stack fault on 80387. 
reserved on 8087 and 802; 

Error summary (set if 
any unmasked exception 
bit is set) 

I Top-of-stack pointer 




Figure 4: The read-only word in the coprocessor status register can be examined by programs to 
determine the result of the most recent floating-point operation or the cause of the most recent 
floating-point exception (error). 



tangent, arctangent, sine, and cosine in- 
structions, which are unique to the 80387, 
are particularly important in these re- 
spects. 

As illustrated in Figure 3, the basic on- 
chip resources that are common to all 
members of the 80*87 family are eight 
floating-point registers, a tag word, a con- 
trol word, and a status word. The floating- 
point registers are each 80 bits wide and 
are generally used as a push-down stack. 
(Each register can instead be addressed di- 
rectly by the programmer when neces- 
sary.) The tag word is divided into eight 2- 
bit fields, each of which corresponds to a 
floating-point register. Each of these fields 
indicates whether the register is empty or 
whether the number in that register is valid 
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or invalid, a floating-point value, zero, or 
infinity. 

The status word, which is read-only, 
contains the condition codes for the most 
recent floating-point operation, error indi- 
cators, and also a specification of which 
floating-point register is the current top of 
stack. This is diagrammed in Figure 4. If 
you push too many numbers onto the 
80x87 stack and the top-of-stack pointer 
wraps, an exception is generated. Similar- 
ly diagrammed in Figure 5, the control 
word is read/write and determines the 
rounding mode of the chip and the preci- 
sion to which operations are carried out. 
"Mask" bits in the control word deter- 
mine which types of errors are allowed to 
generate an interrupt. 



anything is impossible. 




Well, almost impossible. 



The editors of the top computer magazines have bickered be- 
tween themselves for years, so when they finally agree on 
something it's time to take notice. 

And they are unanimous in proclaiming MaynStream 
the best tape backup system around with accolades like 
"easy-to-use," "impressively fast," "quietest," and "an excel- 
lent choice." 

But, if you've kept up with your reading, you already 
knew that. 

"For the first time, one product outranked 
(IUHWEEK) its competitors in every product attribute 
XiSy surveyed." March 27, 1989 



". . .the impressively fast MaynStream 
would be an excellent choice for back- 
ing up your data." May 31, 1988 




"... not only the fastest and quietest of 
W the entire group, it's less expensive. . . 
On the test bench, (it) ran like 
Secretariat." August 3, 1987 



' 'Maynard . . . excels with its MaynStream backup 
software." September, 1987 



fjU* 



". . .the leading brand. . .second only to IBM." 




PCW#RLD 



"PC World tested four tape backup 
units and tagged Maynard Electron- 
ics' as the best backup system based on a combination of 
speed, convenience, and value. . .one year later (it) performs 
just as quickly and efficiently. . ." September 1987 

"... a top quality package that offers a 
clean software interface along with 
good performance. . ." April 1989 
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MaynStream 



Maynard Electronics 

We're backing you one-hundred percent. 

460 E. Semoran Blvd., Casselberry, FL32707 407/263-3500 
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THE 80X87 COPROCESSOR CONTROL REGISTER 



FEDCBA987654321 



SI 



mm 



Invalid operation, 
exception mask 

Denormalized operand, 
exception mask 

Zero divisor, 
exception mask 

Result overflow, 
exception mask 

Result underflow, 
exception mask 

Imprecise result, 
exception mask 




H Reserved 

Interrupt enable mask 
on 8087, reserved 
on 80287 and 80387 

Precision control 

00 = 24 bits 

01 = reserved 

10 = 53 bits 

1 1 = 64 bits 

Rounding control 

00 = round to nearest or even 

01 = round down 

10 = round up 

1 1 = truncate toward zero 

Infinity controlon 8087 and 80287 

= projective 

1 = affine 

reserved on 80387 
Reserved 



Figure 5: The word in the control register can be read or written by application programs to query or 
set the current rounding mode and precision mode and to control error handling. When the exception 
mask bit for a particular error type is set to 1 , that error type will not generate an interrupt, and the value 
of the result will be set to a special bit pattern called a NaN (Not-a-Number). 



^^PA residents add 6% sales tax. 



In addition to the registers that are on 
board the coprocessor, there are special 
coprocessor instruction pointer and data 
pointer registers located on the CPU chip 
that are associated with the coprocessor's 
execution of floating-point instructions. 
When a coprocessor error interrupt occurs, 
the interrupt handler can examine these 
registers to determine the location and type 
of the floating-point instruction and/or op- 
erand that caused the problem. (Since the 
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CPU and coprocessor generally run asyn- 
chronously, the values in the CPU's own 
instruction pointer and registers are not 
likely to be helpful.) 

80x87 DATA FORMATS 

Intel CPUs and coprocessors communi- 
cate primarily by means of shared memo- 
ry. Thus, for example, if you want to use a 
certain number in a calculation and it hap- 
pens to be in a CPU register, the only way 



A Designer's Dream! 

1000 DPI Plain-Paper Typesetting 



Desktop publishing is not only here, 
it's thriving. Every day, more 
graphic designers are learning to use 
computers as tools. However, at the 
standard resolution of 300 x 300 dpi, 
most laser-printed output is used 
primarily for proofing copy. Final copy 
is still sent to the typesetter. 

Like a dream come true, Raster Devices 
Direct, Inc.™ presents an alternative to 
type shops: the Trendsetter 1000™. 

The Trendsetter 1000 is an IBM 
PC/XT/ AT or '386 compatible laser 
printer that provides 1000 x 400 dpi 
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PostScript compatibility mode to print 
1000 x 400 dpi output from PostScript 



language files. The Trendsetter 1000 
provides Plain-Paper Typesetting for all 
your publishing needs. 

So if you use graphics programs, 
desktop publishing, CAD/HPGL, or 
even advanced word processors, and 
have dreamt of 1000 dpi Plain -Paper 
Typesetting for just $5,995 call Raster 
Devices Direct at 1-800-468-1732. (In 
Minnesota, call 612-941-4919.) Send us 
your Corel DRAW, Micrografx Design- 
er, GEM Artline, Aldus PageMaker, 
Xerox Ventura Publisher, AutoCAD, 
Microsoft Word 5.0, WordPerfect 5.0 , o 
HPGL files and we'll send you back 
1000 dpi Plain-Paper Typesetting out- 
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x fjewces 

DIRECT INC* 

PC Publishing Power Direct to You 

1-800-468-1732 FAX (612) 941-5116 



VISA, MASTERCARD, 



© 1989. Trendsetter 1000 and R! 
Devices Corporation. PostScript 
Adobe Systems Inc. AH other pn 
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AMERICAN EXPRESS 
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precision 



s exp mantissa 



31 30 23 22 
Double-precision 



s exp 



63 62 52 51 
Extended-precision 



s exp 



mantissa 



,:.H ■ ' '■' 





mantissa 



79 78 64 63 

Binary-coded-decimal (E 








magnitude (as 18 four-bit BCD digits) 



Figure 6: Floating-point and binary-coded-decimal data formats supported by the Intel 80x87 
numeric coprocessor family in memory. The single-precision and double-precision floating-point 
formats are as specified in the ANSI/IEEE 754 standard. The binary-coded-decimal (BCD) format 
packs two digits per byte, giving a total of 1 8 digits plus a sign bit in an 80-bit format (x represents 7 bits 
that are unused). The 1 6-, 32-, and 64-bit integers (not shown here) are normal two's complement 
binary integers with the sign in the most significant bit. Regardless of a number's format in memory, 
when it is loaded into a floating-point register on the coprocessor, it is always converted into the 
extended (80-bit) floating-point format. 



SPECIAL IEEE-STANDARD EXPONENT VALUES 




t bits 



all zero 



all set 
all set 



Mantissa bits 

all zero 
nonzero 



all zero 
nonzero 



Special meaning 

floating-point zero 
denormalized floating-point 
number (usually result of 
"graceful underflow") 
infinity 

"Not a Number" or "NaN" 
(various reserved mantissa 
values are used to signal 
overflow, unrecoverable 
underflow, invalid operands, 
invalid result, inexact result, 
and so on) 



Figure 7: Floating-point numbers in which all bits are zero or all bits are set in the exponent field are 
trapped and receive special treatment in accordance with the ANSI IEEE 754 binary floating-point 
standard. 



tion there is no direct register-to-register 
communication between the CPU and the 
coprocessor. (The exception is that, on the 
80287 and 80387, the FSTSW instruction 
is able to write the coprocessor's status 
word directly into the CPU's AX register. 
This expedites testing the results of a float- 
ing-point operation for conditional branch- 
ing.) 

The coprocessor family supports 7 dif- 
ferent data types in memory: 16-, 32-, and 
64-bit integers; 80-bit packed binary coded 



Intel CPUs and 
coprocessors 
communicate primarily 
via shared memory 
rather than direct 
register-to-register. 
So to use a number in 
a CPU register in a 
calculation you must 
store it in RAM first. 



decimal (BCD); and 32-, 64-, and 80-bit 
floating-point numbers. These are shown 
in Figure 6. The 32-bit and 64-bit floating- 
point formats are identical to those defined 
by the ANSI/IEEE 754 Standard. The 16- 
and 32-bit integer formats are identical to 
those supported by the entire 80^:86 fam- 
ily. And the 64-bit integer format is the 
same as the double-precision integers used 
by the 80386 machine when it is running in 
32-bit protected mode. Regardless of a 
number's format in memory, however, 
when it is loaded into a coprocessor float- 
ing-point register it is always converted to 
the 80-bit "extended-precision" floating- 
point format (which is also called the 
"temporary real" data type in some Intel 
manuals). 

The use of the extended-precision float- 
ing-point format for all operations that are 
internal to the 80x87 coprocessor family 
has some interesting consequences. First, 
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INIT87.ASM 



title 
page 

INIT87.ASM 

Copyright (C) 
PC Magazine * 

call with: 



COMPLETE LISTING 



INIT87.ASM initialize Numeric coprocessor 
55,132 

Initialize 80x87 Numeric coprocessor 

1989 Ziff Davis communications 
Ray Duncan 



AX 



= control word for desired rounding 
mode, precision, exception mask 



; Destroys: 

TEXT segment 

assume 

public 
init87 proc 

push 
push 

mov 

push 

mov 

f ninit 
f nstsw 

pop 
or 

jnz 

f ldcw 

initx: pop 
pop 
ret 

init87 endp 
_text ends 
end 



(if coprocessor present) 
z flag = True ( 1) 

(if coprocessor not found) 
Z flag = False (0) 

nothing 

word public 'CODE' 

cs:_TEXT 

init87 
near 

bx 
ax 



ax, -1 
ax 

bx, sp 



ss: [bx] 
ax 

al , al 

initx 

ss: [bx+2] 

ax 
bx 



save registers 

put FFFFH on stack 

make it addressable 

try to initialize coprocessor 
try to get status word 

if low 8 bits are zero, 
coprocessor is present 

jump if no coprocessor 

load coprocessor control word 

restore registers 

and return result in Z flag 







Figure 8: This routine tests for the presence of an 80x87 numeric coprocessor and, if one is present, 
configures it for the desired rounding, precision, and error-handling modes. 



since the dynamic range of the 80-bit for- 
mat is so much greater than that of the 32- 
bit or 64-bit floating-point formats typical- 
ly used in program variables, an overflow 
or underflow of a final result is quite rare, 
assuming that all intermediate results are 
maintained on the chip. Second, conver- 
sion of a number from one data type to an- 
other in memory is trivial: you just load the 
original data from memory onto the co- 
processor, and then unload it again in the 
desired format. 

Let's take a closer look at the coproces- 
sor's extended floating-point data type, 
which corresponds to the optional double 
extended format of the ANSI/IEEE 754 



Standard. (I discussed the single-precision 
and double-precision floating-point num- 
ber formats here last time.) The dynamic 
range of these numbers is approximately 
±3.4*10" 4932 to ±1.2*10 493i . The 80 bits 
are divided into three fields: a sign bit, a 
15-bit exponent (sometimes called a 
"characteristic"), and a 64-bit mantissa 
(or "significand"). 

The sign bit is 1 if the number is nega- 
tive and if the number is positive. The 
mantissa is unsigned and does not change 
with the sign of the floating-point number. 
Because the mantissa is left-normalized, 
its most significant bit is always 1 unless 
the number is zero or unless it is the result 



lepra 
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DIMUL87.ASM 



title 
page 

DIMUL87 .ASM 



COMPLETE LISTING 



Copyright (C) 
PC Magazine * 

Call with: 



DIMUL87 .ASM 80x87-based signed Divide 
55,132 

Double Precision Signed Integer Multiply 

for 80x87 coprocessor and 8086, 8088, 80286, or 

80386 in real mode/16-bit protected mode 

Be sure to call INIT87 routine first to test for 
coprocessor existence and to set rounding mode, 
precision, and exception masks! 

1989 Ziff Davis Communications 
Ray Duncan 



DX:AX 
CX:BX 



= double-precision argument 1 
= double-precision argument 2 



Returns : 
Destroys : 
TEXT segment 
assume 



dimul 



dimul 
TEXT 



public 
proc 

push 
push 

push 
push 



f lid 
f imul 

f istp 
fwait 

pop 
pop 
pop 
pop 
ret 

endp 

ends 



DX:CX:BX:AX 
nothing 

word public ' CODE 

cs:_TEXT 

dimul 
near 

dx 
ax 



cx 
bx 

bx, sp 

dword ptr ss:[bx] 
dword ptr ss:[bx-i-4] 

qword ptr ss : [bx] 



ax 
bx 
cx 
dx 



quad-precision product 



; put argument 1 on stack 
; put argument 2 on stack 

; make arguments addressable 

; load one argument 

; multiply it by the other 

; unload the result 

; wait for it to arrive 

; retrieve result 
; and exit 



Figure 9: This performs signed integer double-precision multiplication using the coprocessor. 



of an operation that underflowed or had in- 
valid operands. Unlike the single-preci- 
sion and double-precision floating-point 
data formats, no implied leading bit is used 
in the mantissa of the extended-precision 
format. 

The exponent is "biased" — offset 
from zero— by the value 16,383 (3FFFH). 
For example, a value of 16,386 in the ex- 
ponent fields corresponds to an exponent 
of 3 — in other words, the mantissa is mul- 
tiplied by 2 3 . Use of a biased exponent en- 
sures that the reciprocal of any normalized 
floating-point number can be represented 
without underflow. The exponent can also 
take on two "magic" values that cause the 



number to be handled in a special way. If 
all bits of the exponent are zero, then the 
number is either zero or is a "denorma- 
lized" number — the result of a "graceful 
underflow." If all bits of the exponent are 
set, then the floating-point number repre- 
sents either infinity or a special signalling 
value called a NaN (Not-a-Number). 
These combinations and their meanings 
are shown in Figure 7. 

ELEMENTARY COPROCESSOR USAGE 

Before an application program can make 
use of a coprocessor, it must first verify the 
coprocessor's presence in the host ma- 
chine and then configure it for the desired 




Please write to: 
PC Magazine, P.O. 
Box 54093, Boulder, 
CO 80322. Include 
your mailing label 
from a recent issue 
of PC Magazine 
for faster service. 
Please allow up to 
60 days for change 
of address to take 
place. 
1 



1 



PROTECT YOUR COPIES 
OF PC MAGAZINE 

Make your collection of PC Magazine a handsome addi- 
tion to your office or home— and protect and organize 
your copies for easy reference! 
PC Magazine Binders and Cases are made of durable, 
luxury-look leatherette over quality binder board. Custom 
designed for PC Magazine, every order receives FREE 
transfer foil to mark dates and volume numbers. 

FOR FAST SERVICE CALL TOLL-FREE 1-800-972-5858 



MAGAZINE BINDERS 



Hold your issues on individual 
snap-on rods. $9.95 each; 
3 for $27.95; 6 for $52.95. 

OPEN BACK CASES 

Store your copies 
for individual reference. 
$Z95 each; 3 for $21.95; 6 for $39.95. 




PC MAGAZINE 



do Jesse Jones Industries 

499 E. Erie Avenue - Philadelphia, PA 19134 

Please send □ Binders □ Cases Quantity 

Payment enclosed $ * Add $1 per case/binder for 



postage & handling. (Outside USA, add $2.50 per case/binder 
ordered, UScurrency only.) 

Charge my: □ Amex Visa □ MC (Minimum order $15.) 



Card No. 


F«n Dote 




Mr /Mr. /M« 


Address 


Please Print Full Nome 




City 


No P.O. Box Number Please 




Stole 






| " PA residents add 6% sales tax. 





JANUARY 16, 1990 PC MAGAZINE FET1 



Power Programming 



DIDIV87. 



title 
page 

DIDIV87 . ASM 



COMPLETE LISTING 



DIE 
55, 



^SM 80x87-based Signed Divide 



Copyright (C) 
PC Magazine * 

Call with: 



Double Precision signed Integer Divide 

for 80x87 coprocessor and 8086, 8088, 80286, or 

80386 in real mode/16-bit protected mode 

Be sure to call INIT87 routine first to test for 
coprocessor existence and to set rounding; mode:,: i 
precision, and exception masks! 

1989 Ziff Davis Communications 
Ray Duncan 



ma 



DX:CX:BX:AX 
SI:DI 



Returns: 

; Destroys: 
jtext segment 



DX:AX 
CX:BX 



= quad-precision dividend 
= double-precision divisor 

= double-precision quotient 
= double-precision remainder 



didiv 



didiv 
TEXT 



public 
proc 

push 
push 
push 
push 

push 
push 



fild 
fild 

fid 
fid 

fdivrp 

f istp 

fprem 

f istp 
f stp 

pop 
pop 

pop 
pop 

add 

ret? ; 
endp 
ends :: 
end 



nothing 

word public 'code* 

cs:_TEXT 

didiv 
near 

dx 
cx 
bx 
ax 

si 
di 

bx, sp 

dword ptr ss: [bx] 
qword ptr ss: [bx+4 ] 

st(l) 

9t(l) 

St ( 1) , St (0 ) 

dword ptr ss : [bx] 



dword ptr ss:[bx+4] 
st(0) 

ax 
dx 

bx 
cx 



Bp, 4 



put dividend on stack 



; put divisor on stack 



; make arguments addressable 

; put divisor on coprocessor 
; put dividend on coprocessor 

; make copies of both 



; perform signed divide 

; unload quotient 

,- calculate remainder 

; unload remainder 

; discard stack top 

; quotient into DX:AX 

; remainder into CX:BX 



; clean up stack 
; and exit 



Figure 10: This performs signed integer double-precision division with the numeric coprocessor. 



rounding, precision, and error handling 
modes. One method of doing this is illus- 
trated by the procedure INIT87.ASM, 
shown in Figure 8. The routine will first 
execute a no-wait form of the coprocessor 
initialization instruction FINIT, then it will 



attempt to transfer the status word from the 
coprocessor into a memory word that has 
been initialized to FFFFH with the 
FSTSW instruction. If the coprocessor is 
present, the procedure sets the desired 
modes by loading the coprocessor control 



word and then returns a status code in the 
CPU's zero flag. 

The DIMUL87. ASM and DIDIV- 
87. ASM routines, which are listed in Fig- 
ures 9 and 10, illustrate some basic princi- 
ples of coprocessor programming. They 
use the coprocessor to implement double- 
precision integer (32-bit) signed multipli- 
cation and division, and are thus symmet- 
ric with the software-only unsigned 
multiply and divide routines DMUL 
.ASM and DDIV.ASM, published here in 
our November 14, 1989, issue. Note that 
the coprocessor does not directly support 
unsigned integer multiplies or divides, so I 
couldn't code direct equivalents for the 
previous DMUL.ASM and DDIV.ASM 



Before an application 
program can make 
use of a coprocessor, 
it must first verify 
its presence in the 
host machine. 



routines without going to a great deal of 
trouble to handle the upper half of the un- 
signed range. 

The interactive demo programs TRY- 
DIMUL and TRYDIDIV illustrate the use 
of INIT87, DIMUL87, and DIDIV87. 
They test the status of the coprocessor, 
prompt you for two arguments, carry out 
the multiplication or division operation on 
the coprocessor, and then display the re- 
sults. The source code for these two demo 
programs can be downloaded from PC 
MagNet. 

In the next issue, we'll proceed to a dis- 
cussion of coprocessor floating-point oper- 
ations, proper use of the WATT (FWAIT) 
instruction on the various coprocessor 
models, use of the various coprocessor 
rounding modes, and error handling. 

THE IN-BOX 

Please send your questions, comments, 
and suggestions to me at any of the follow- 
ing e-mail addresses: 
PC MagNet: 72241,52 
MCI Mail: rduncan 

BIX: rduncan ■ 
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